Showing posts with label apache-maven. Show all posts
Showing posts with label apache-maven. Show all posts

Saturday, October 17, 2015

How to do Integration tests with Solr 5.x

Please keep in mind that what is described below is valid only if you have a Solr instance with a single core. Thanks to +Alessandro Benedetti for alerting me on this sneaky stuff ;) 
I recently migrated a project [1] from Solr 4.x to Solr 5.x (actually Solr 5.3.1),  and the only annoying stuff has been a (small) refactoring of my integration test suite.

Previously, I always used the cool Maven Cargo plugin for running and stopping Solr (ehmm Jetty with a solr.war deployed in) before and after my suite. For those who are still using Solr 4.x here [2] there's the configuration. It is just a matter of a single command:

> maven install

Unfortunately, Solr 5.x is no longer (formally) a web application, so I'd need to find another way to run the integration suite. After googling a bit I wasn't able to find a solution so I asked to myself : "How do Solr folks run their integration tests?" and I find this artifact [3] on the Maven repository: solr-test-framework..."well, the name sounds good", I said. 

Effectively, I found a lot of already built things that do a lot of stuff for you. In my case, I only had to change a bit my integration suite superclass; actually a simple change because I had to extend from org.apache.solr.SolrJettyTestBase.

This class provides methods for starting and stopping Jetty (yes, still Jetty because even formally Solr is no longer a JEE web application, actually it still is, and it comes bundled with a Jetty, which provides the HTTP connectivity). Starting the servlet container in your methods is up to you, by means of the several createJetty(...) static methods. Instead, that class provides a @BeforeClass annotated method which stops Jetty at the end of the execution, of course in case it has been previously started.

You can find my code here [4], any feedback is warmly welcome ;) 

--------------------------
[1] https://github.com/agazzarini/SolRDF
[2] pom.xml using the Maven Cargo plugin and Solr 4.10.4
[3] http://mvnrepository.com/artifact/org.apache.solr/solr-test-framework/5.3.1
[4] SolRDF test superclass using the solr-test-framework

Thursday, October 23, 2014

SPARQL Integration tests with jena-nosql

In a previous post I illustrated how to set up a working environment with jena-nosql using either Apache Solr or Apache Cassandra. Now it's time to write some integration tests. 

The goal of the jena-nosql project is to have Apache Jena, one of the most popular RDF frameworks, bound with your favourite NoSQL database.

Among a lot of things that Jena can do, SPARQL definitely plays an important role so, in my project, I want to make sure the data model of the underlying pluggable storages is able to efficiently support all the query language features.

As a first step I need an integration test for running SPARQL verifiable examples. In order to do that I will set up two Jena Models, in the @Before method: a first coming from jena-nosql:

final StorageLayerFactory factory = StorageLayerFactory.getFactory();
final Dataset dataset = DatasetFactory.create(factory.getDatasetGraph());   
final Model jenaNoSqlModel = dataset.getDefaultModel();   

and a second using the default in-memory Jena Model:

final Model inMemoryModel = DatasetFactory.createMem().getDefaultModel();   

Now what I need is a set of verifiable scenarios, each of one consisting of
  • one or more dataset to load 
  • a query
  • the corresponding query results
I would need this "triplet" for each scenario...and as you can imagine, that's a huge work!

Fortunately, some time ago I bought a cool book, "Learning SPARQL" which had a lot of downloadable examples. After re-having a quick look, I realized that was exactly what I need :)

Each example in the book is associated with three files:
  • a file containing a short dataset
  • a file containing a query
  • a file containing results in a human readable way
Perfect! I don't need the third file because the verification can be done by comparating the execution of the load / query sequence both in jena-nosql and in-memory model (assuming the Jena in memory model is perfectly working)

So before running each scenario both models are loaded with the example dataset:

jenaNoSqlModel.read(dataset, ...);  
inMemoryModel.read(dataset, ...);   

// Make sure data has been loaded and graphs are isomorphic
assertFalse(jenaNoSqlModel.isEmpty());
assertTrue(jenaNoSqlModel.isIsomorphicWith(inMemoryModel));

Once did that, it's time to execute the query associated with the example and then verify the results on both models:

final Query query = QueryFactory.create(readQueryFromFile(queryFile));
assertTrue(    ResultSetCompare.isomorphic(
          QueryExecutionFactory.create(query, jenaNoSqlModel).execSelect(),
          QueryExecutionFactory.create(query, inMemoryModel).execSelect());

For simplicity here I'm wrongly using jena resources like QueryExecution (you should close that in a finally block) and I didn't write any exception handling code.

I'm still working on that, but if you want to have a quick look here's the code. As explained in previous posts you can run this test against one of the available storages (Solr or Cassandra):

   > mvn clean install -P cassandra-2x 
   > mvn clean install -P solr-4x 

Saturday, September 06, 2014

Integration tests with jena-nosql and Solr

The main goal of the jena-nosql project is to have a framework for building a set of noSQL storage bindings for Apache Jena.

So the idea is that using a piece of code like this

final StorageLayerFactory factory = StorageLayerFactory.getFactory();
final Dataset dataset = DatasetFactory.create(factory.getDatasetGraph());   
final Model model = dataset.getDefaultModel().read(...);    
final Query query = QueryFactory.create(...);
final ResultSet resultset = QueryExecutionFactory.create(query, dataset)).execSelect(); 
factory.getClientShutdownHook().close();

I'd be able to insert and query some data using Jena API.
But where? That actually depends on the binding we choose. At the moment, in order to "test" the framework idea I created two modules: one for Cassandra and the other for Solr.

As you can see in the project there's a module for doing integration tests. There, I had to think, before running those tests, some way to start one storage or another in a transparent way.      

As you probably know, if you read some other post of mine, I am a big fan of Apache Maven, and I must say that in these situations it is a very great and productive tool.

In the previous post I (briefly) explained how to start the integration test suite with a backing Apache Cassandra. Here I'll do the same but using Apache Solr.

There isn't an official Maven plug-in for Solr, this is the main difference with Cassandra. So after googling a bit I decided to use Cargo.

Cargo has a nice and powerful Maven plug-in that I configured within a solr-4x profile in my pom.xml. Doing so I'm able to run

   > mvn clean install -P solr-4x 
 
The very first time you run this command, Maven, as usual, will download all required dependencies including the solr war. Once did that, it will
  • start an embedded Jetty instance with Solr deployed inside;
  • run the integration test suite;
  • stop Jetty 
So, at the end, running the same test suite against one storage or another, it is just a matter of using a different Maven profile in the build ;)