Captain Gazza's Log...: cassandra

Showing posts with label cassandra. Show all posts

Thursday, October 23, 2014

SPARQL Integration tests with jena-nosql

In a previous post I illustrated how to set up a working environment with jena-nosql using either Apache Solr or Apache Cassandra. Now it's time to write some integration tests.

The goal of the jena-nosql project is to have Apache Jena, one of the most popular RDF frameworks, bound with your favourite NoSQL database.

Among a lot of things that Jena can do, SPARQL definitely plays an important role so, in my project, I want to make sure the data model of the underlying pluggable storages is able to efficiently support all the query language features.

As a first step I need an integration test for running SPARQL verifiable examples. In order to do that I will set up two Jena Models, in the @Before method: a first coming from jena-nosql:

final StorageLayerFactory factory = StorageLayerFactory.getFactory();
final Dataset dataset = DatasetFactory.create(factory.getDatasetGraph());
final Model jenaNoSqlModel = dataset.getDefaultModel();

and a second using the default in-memory Jena Model:

final Model inMemoryModel = DatasetFactory.createMem().getDefaultModel();

Now what I need is a set of verifiable scenarios, each of one consisting of

one or more dataset to load
a query
the corresponding query results

I would need this "triplet" for each scenario...and as you can imagine, that's a huge work!

Fortunately, some time ago I bought a cool book, "Learning SPARQL" which had a lot of downloadable examples. After re-having a quick look, I realized that was exactly what I need :)

Each example in the book is associated with three files:

a file containing a short dataset
a file containing a query
a file containing results in a human readable way

Perfect! I don't need the third file because the verification can be done by comparating the execution of the load / query sequence both in jena-nosql and in-memory model (assuming the Jena in memory model is perfectly working)

So before running each scenario both models are loaded with the example dataset:

jenaNoSqlModel.read(dataset, ...);
inMemoryModel.read(dataset, ...);

// Make sure data has been loaded and graphs are isomorphic
assertFalse(jenaNoSqlModel.isEmpty());
assertTrue(jenaNoSqlModel.isIsomorphicWith(inMemoryModel));

Once did that, it's time to execute the query associated with the example and then verify the results on both models:

final Query query = QueryFactory.create(readQueryFromFile(queryFile));
assertTrue(    ResultSetCompare.isomorphic(
          QueryExecutionFactory.create(query, jenaNoSqlModel).execSelect(),
          QueryExecutionFactory.create(query, inMemoryModel).execSelect());

For simplicity here I'm wrongly using jena resources like QueryExecution (you should close that in a finally block) and I didn't write any exception handling code.

I'm still working on that, but if you want to have a quick look here's the code. As explained in previous posts you can run this test against one of the available storages (Solr or Cassandra):

   > mvn clean install -P cassandra-2x

   > mvn clean install -P solr-4x

Saturday, September 06, 2014

Integration tests with jena-nosql and Solr

The main goal of the jena-nosql project is to have a framework for building a set of noSQL storage bindings for Apache Jena.

So the idea is that using a piece of code like this

final StorageLayerFactory factory = StorageLayerFactory.getFactory();
final Dataset dataset = DatasetFactory.create(factory.getDatasetGraph());
final Model model = dataset.getDefaultModel().read(...);
final Query query = QueryFactory.create(...);
final ResultSet resultset = QueryExecutionFactory.create(query, dataset)).execSelect();

factory.getClientShutdownHook().close();

I'd be able to insert and query some data using Jena API.
But where? That actually depends on the binding we choose. At the moment, in order to "test" the framework idea I created two modules: one for Cassandra and the other for Solr.

As you can see in the project there's a module for doing integration tests. There, I had to think, before running those tests, some way to start one storage or another in a transparent way.

As you probably know, if you read some other post of mine, I am a big fan of Apache Maven, and I must say that in these situations it is a very great and productive tool.

In the previous post I (briefly) explained how to start the integration test suite with a backing Apache Cassandra. Here I'll do the same but using Apache Solr.

There isn't an official Maven plug-in for Solr, this is the main difference with Cassandra. So after googling a bit I decided to use Cargo.

Cargo has a nice and powerful Maven plug-in that I configured within a solr-4x profile in my pom.xml. Doing so I'm able to run

   > mvn clean install -P solr-4x

The very first time you run this command, Maven, as usual, will download all required dependencies including the solr war. Once did that, it will

start an embedded Jetty instance with Solr deployed inside;
run the integration test suite;
stop Jetty

So, at the end, running the same test suite against one storage or another, it is just a matter of using a different Maven profile in the build ;)

Thursday, August 21, 2014

Integration tests with jena-nosql and Cassandra

In this post I will illustrate the integration tests infrastructure I used for the jena-nosql project on my GitHub account.

This article has been "copied" from the project Wiki, you can find the original content here.

The core of the project itself is not associated with a specific storage, so a set of integration tests that run towards a (mini)instance of a not-well-known target storage is required in order to make sure about the functional correctness of each binding.

Within the project you can see a (Maven) module dedicated to integration tests: the jena-nosql-integration-tests. It is configured with the Maven Failsafe plugin [1] to run tests, during the integration-test phase, against a running instance of the target storage.

And here comes the beauty, because the target storage is not predefined, but instead depends on what is the runtime binding module that has been choosen. So basically the same set of integration tests could be run against Cassandra, HBase, Accumulo or another storage.

How can we maintain the same set of tests and at the same time start a test instance of one storage or another? Maven profiles is the answer (at least for me): within the pom.xml of the jena-nosql-integration-tests, I defined one profile for each storage (at the time of writing there's just one profile ;) ).

So for example the cassandra-2x profile declares the usage of the Cassandra Maven Plugin [2] that

starts an embedded instance of Cassandra before all integration tests
stops that instance after the last integration test

So, at the end, if you want to run the integration test suite against Cassandra, just cd to jena-nosql project directory (where the top level pom.xml is located) and run Maven as follows:

   > mvn clean install -P cassandra-2x

You don't need to set any permission or any configuration because the embedded instance will "live" within the target build folder.

[1] http://maven.apache.org/surefire/maven-failsafe-plugin
[2] http://mojo.codehaus.org/cassandra-maven-plugin

Monday, August 18, 2014

Jena-nosql: A NoSQL adapter for Apache Jena

Few days ago I started this project on github.

https://github.com/agazzarini/jena-nosql

The overall design rounds around the Abstract Factory design pattern [1].
As you can see from the following diagram, the StorageLayerFactory class plays the role of the AbstractFactory and therefore defines the contract that each concrete implementor (i.e. family) must provide in order to create concrete products for a specific kind of storage.

On top of that, each binding module defines the "concrete" layer that is in charge to provide

an implementation of the StorageLayerFactory (i.e. the Concrete Factory);
a concrete implementation of each (abstract) product defined in the diagram below (i.e. the Concrete Products)

Here you can see the same diagram as above but with the "Cassandra" family members (note that only 4 members are shown in order to simplify the diagram)

Feel free to take a look and let me know your thoughts.

Gazza