Captain Gazza's Log...: integration-tests

Showing posts with label integration-tests. Show all posts

Saturday, October 17, 2015

How to do Integration tests with Solr 5.x

Please keep in mind that what is described below is valid only if you have a Solr instance with a single core. Thanks to +Alessandro Benedetti for alerting me on this sneaky stuff ;)

I recently migrated a project [1] from Solr 4.x to Solr 5.x (actually Solr 5.3.1), and the only annoying stuff has been a (small) refactoring of my integration test suite.

Previously, I always used the cool Maven Cargo plugin for running and stopping Solr (ehmm Jetty with a solr.war deployed in) before and after my suite. For those who are still using Solr 4.x here [2] there's the configuration. It is just a matter of a single command:

> maven install

Unfortunately, Solr 5.x is no longer (formally) a web application, so I'd need to find another way to run the integration suite. After googling a bit I wasn't able to find a solution so I asked to myself : "How do Solr folks run their integration tests?" and I find this artifact [3] on the Maven repository: solr-test-framework..."well, the name sounds good", I said.

Effectively, I found a lot of already built things that do a lot of stuff for you. In my case, I only had to change a bit my integration suite superclass; actually a simple change because I had to extend from org.apache.solr.SolrJettyTestBase.

This class provides methods for starting and stopping Jetty (yes, still Jetty because even formally Solr is no longer a JEE web application, actually it still is, and it comes bundled with a Jetty, which provides the HTTP connectivity). Starting the servlet container in your methods is up to you, by means of the several createJetty(...) static methods. Instead, that class provides a @BeforeClass annotated method which stops Jetty at the end of the execution, of course in case it has been previously started.

You can find my code here [4], any feedback is warmly welcome ;)

--------------------------
[1] https://github.com/agazzarini/SolRDF
[2] pom.xml using the Maven Cargo plugin and Solr 4.10.4
[3] http://mvnrepository.com/artifact/org.apache.solr/solr-test-framework/5.3.1
[4] SolRDF test superclass using the solr-test-framework

Tuesday, February 10, 2015

SPARQL Integration tests with SolRDF

Last year, I got a chance to give some contribution to a wonderful project, CumulusRDF, an RDF store on a cloud-based architecture. The Integration Test Suite was one of the most interesting task I worked on.

There, I used JUnit for running some examples coming from Learning SPARQL by Bob DuCharme (O'Reilly, 2013). Both O'Reilly and the Author (BTW thanks a lot) gave me permissions to do that in the project.

So, when I set up the first prototype of SolRDF, I wondered how I could create a complete (integration) test suite for doing more or less the same thing...and I came to the obvious conclusion that something of that work could be reused.

Something had to be changed. mainly because CumulusRDF uses Sesame as underlying RDF framework, while SolRDF uses Jena...but at the end it was a minor change...they are both valid, easy and powerful.

So, for my LearningSPARQL_ITCase I needed:

A setup method for loading the example data;
A teardown method for cleaning up the store;

The example data is provided, in the LearningSPARQL website, in several files. Each file can contain: a small dataset or a query or an expected result (in tabular format). So, returning to my tests, the flow should load the small dataset X, run the query Y and verify the results Z.

Although this post illustrates how to load a sample dataset in SolRDF, this is something that you can do from the command line, and not in a JUnit test. Instead, using Jena, in my Unit tests, I load the data in SolRDF using these few lines:

// DatasetAccessor provides access to
// remote datasets using SPARQL 1.1 Graph Store HTTP Protocol
DatasetAccessor dataset = DatasetAccessorFactory.createHTTP();

// Load a local memory model
Dataset memoryDataset = DatasetFactory.createMem();
Model memoryModel = memoryDataset.getDefaultModel();
memoryModel.read(dataURL, ...);

// Load the memory model in the remote dataset
dataset.add(memoryModel);

Ok, data has been loaded! In another post I will explain what I did, in SolRDF, for supporting the SPARQL 1.1 Graph Store HTTP Protocol. Keep in mind that the protocol is not fully covered at the moment.

Now, it's time to run a query and check the results. As you can see I'll execute the same query twice: the first is against a memory model, the second towards SolRDF. In this way, assuming the memory model of Jena is perfectly working, I will be able to check and compare results coming from the remote dataset (i.e. coming from SolRDF):

final Query query = QueryFactory.create(readQueryFromFile(...));
QueryExecution execution = null;
QueryExecution memExecution = null;
    try {
       execution = QueryExecutionFactory.sparqlService(SOLRDF_URL, query);
       memExecution = QueryExecutionFactory.create(query, memoryDataset);

       ResultSet rs = execution.execSelect();
       ResultSet mrs = memExecution.execSelect();
       assertTrue(ResultSetCompare.isomorphic(rs, mrs));
    } catch (...) {
       ...
    } finally {
       // Close executions
    }

After that, the RDF store needs to be cleared. Although the Graph Store protocol would come in our help, it cannot be implemented in Solr because some HTTP methods (i.e. PUT and DELETE) cannot be used in RequestHandlers. The SolrRequestParsers, which is the first handler of the incoming requests, allows those methods only for /schema and /config requests. So while a clean up could be easily done using something like this:

dataset.deleteDefault();

Or, in HTTP:

DELETE /rdf-graph-store?default HTTP/1.1
Host: example.com

I cannot implement such behaviour in Solr. After checking the SolrConfig and SolrRequestParsers classes I believe the most, non RDF, simple way to clean up the store is:

SolrServer solr = new HttpSolrServer(SOLRDF_URI);
solr.deleteByQuery("*:*");
solr.commit();

I know, that has nothing to do with RDF and with the Graph Store protocol, but I wouldn't like to change the Solr core and at the moment that represents a good compromise. After all, that code resides only in my unit tests.

That's all! I just merged all those stuff in the master so feel free to have a look. If you want to run the integration test suite you can do that from command line:

# cd $SOLRDF_HOME
# mvn clean install

or in Eclipse, using the predefined Maven launch configuration solrdf/src/dev/eclipse/run-integration-test-suite.launch. Just right-click on that file e choose "Run as..."

Regardless the way: you will see these messages:

(build section)

[INFO] -----------------------------------------------------------------
[INFO] Building Solr RDF plugin 1.0
[INFO] -----------------------------------------------------------------

(unit tests section)

-------------------------------------------------------
T E S T S
-------------------------------------------------------
...
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.691 sec
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0

(cargo section. It starts the embedded Jetty)

[INFO] [beddedLocalContainer] Jetty 7.6.15.v20140411 Embedded starting...
...
[INFO] [beddedLocalContainer] Jetty 7.6.15.v20140411 Embedded started on port [8080]

(integration tests section)

------------------------------------------------------
T E S T S
-------------------------------------------------------
Running org.gazzax.labs.solrdf.integration.LearningSparql_ITCase
[INFO] Running Query with prefixes test...
[INFO] [store] webapp=/solr path=/rdf-graph-store params={default=} status=0 QTime=712
...
[DEBUG] : Query type 222, incoming Accept header...

(end)

[INFO] [store] Closing main searcher on request.
...
[INFO] [beddedLocalContainer] Jetty 7.6.15.v20140411 Embedded is stopped
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 42.302s
[INFO] Finished at: Tue Feb 10 18:19:21 CET 2015
[INFO] Final Memory: 39M/313M
[INFO] --------------------------------------------------

Best,
Andrea

Thursday, August 21, 2014

Integration tests with jena-nosql and Cassandra

In this post I will illustrate the integration tests infrastructure I used for the jena-nosql project on my GitHub account.

This article has been "copied" from the project Wiki, you can find the original content here.

The core of the project itself is not associated with a specific storage, so a set of integration tests that run towards a (mini)instance of a not-well-known target storage is required in order to make sure about the functional correctness of each binding.

Within the project you can see a (Maven) module dedicated to integration tests: the jena-nosql-integration-tests. It is configured with the Maven Failsafe plugin [1] to run tests, during the integration-test phase, against a running instance of the target storage.

And here comes the beauty, because the target storage is not predefined, but instead depends on what is the runtime binding module that has been choosen. So basically the same set of integration tests could be run against Cassandra, HBase, Accumulo or another storage.

How can we maintain the same set of tests and at the same time start a test instance of one storage or another? Maven profiles is the answer (at least for me): within the pom.xml of the jena-nosql-integration-tests, I defined one profile for each storage (at the time of writing there's just one profile ;) ).

So for example the cassandra-2x profile declares the usage of the Cassandra Maven Plugin [2] that

starts an embedded instance of Cassandra before all integration tests
stops that instance after the last integration test

So, at the end, if you want to run the integration test suite against Cassandra, just cd to jena-nosql project directory (where the top level pom.xml is located) and run Maven as follows:

   > mvn clean install -P cassandra-2x

You don't need to set any permission or any configuration because the embedded instance will "live" within the target build folder.

[1] http://maven.apache.org/surefire/maven-failsafe-plugin
[2] http://mojo.codehaus.org/cassandra-maven-plugin