Thursday, October 23, 2014

SPARQL Integration tests with jena-nosql

In a previous post I illustrated how to set up a working environment with jena-nosql using either Apache Solr or Apache Cassandra. Now it's time to write some integration tests. 

The goal of the jena-nosql project is to have Apache Jena, one of the most popular RDF frameworks, bound with your favourite NoSQL database.

Among a lot of things that Jena can do, SPARQL definitely plays an important role so, in my project, I want to make sure the data model of the underlying pluggable storages is able to efficiently support all the query language features.

As a first step I need an integration test for running SPARQL verifiable examples. In order to do that I will set up two Jena Models, in the @Before method: a first coming from jena-nosql:

final StorageLayerFactory factory = StorageLayerFactory.getFactory();
final Dataset dataset = DatasetFactory.create(factory.getDatasetGraph());   
final Model jenaNoSqlModel = dataset.getDefaultModel();   

and a second using the default in-memory Jena Model:

final Model inMemoryModel = DatasetFactory.createMem().getDefaultModel();   

Now what I need is a set of verifiable scenarios, each of one consisting of
  • one or more dataset to load 
  • a query
  • the corresponding query results
I would need this "triplet" for each scenario...and as you can imagine, that's a huge work!

Fortunately, some time ago I bought a cool book, "Learning SPARQL" which had a lot of downloadable examples. After re-having a quick look, I realized that was exactly what I need :)

Each example in the book is associated with three files:
  • a file containing a short dataset
  • a file containing a query
  • a file containing results in a human readable way
Perfect! I don't need the third file because the verification can be done by comparating the execution of the load / query sequence both in jena-nosql and in-memory model (assuming the Jena in memory model is perfectly working)

So before running each scenario both models are loaded with the example dataset:

jenaNoSqlModel.read(dataset, ...);  
inMemoryModel.read(dataset, ...);   

// Make sure data has been loaded and graphs are isomorphic
assertFalse(jenaNoSqlModel.isEmpty());
assertTrue(jenaNoSqlModel.isIsomorphicWith(inMemoryModel));

Once did that, it's time to execute the query associated with the example and then verify the results on both models:

final Query query = QueryFactory.create(readQueryFromFile(queryFile));
assertTrue(    ResultSetCompare.isomorphic(
          QueryExecutionFactory.create(query, jenaNoSqlModel).execSelect(),
          QueryExecutionFactory.create(query, inMemoryModel).execSelect());

For simplicity here I'm wrongly using jena resources like QueryExecution (you should close that in a finally block) and I didn't write any exception handling code.

I'm still working on that, but if you want to have a quick look here's the code. As explained in previous posts you can run this test against one of the available storages (Solr or Cassandra):

   > mvn clean install -P cassandra-2x 
   > mvn clean install -P solr-4x