In a preceding post I explained how to set-up SolRDF in two minutes, leveraging Maven for automatically building and installing the whole stuff.
Once installed, you can index data by issuing a command like this:
> curl -v http://localhost:8080/solr/store/update/bulk?commit=true -H "Content-Type: application/n-triples" --data-binary @/path-to-your-data/data.nt
and then, you can execute a SPARQL query in this way:
> curl "http://127.0.0.1:8080/solr/store/sparql" --data-urlencode "q=SELECT * WHERE { ?s ?p ?o } LIMIT 10" -H "Accept: application/sparql-results+xml"
Now, since the whole stuff is running within a full text search engine, why don't we try to combine some of the cool features of Solr with SPARQL results?
The underlying idea is: SPARQL results serialization is standardized in several W3C documents and therefore cannot be changed. We need a way to embed those results in another response that will contain additional information like metadata, facets and so on.
Solr query response sounds perfect to accomplish this goal: I have only to replace the <result> section with a <sparql> document (note I'm specifically talking about the XML response writer, I implemented only this writer at this moment; other formats are coming in the next episodes...). Running a query like this
/sparql?facet=true&facet.field=p&start=100&rows=10&q=SELECT * WHERE {?s ?p ?o}
I can get the following response (note the mix between SPARQL and Solr results):
<response>
<lst name="responseHeader">
<int name="status">0
<int name="QTime">31
<str name="query">SELECT * WHERE{ ?s ?p ?o}
</lst>
<result name="response" numFound="3875" start="100" maxScore="1.0">
<sparql>
<head>
<variable name="s"/>
<variable name="p"/>
<variable name="o"/>
</head>
<results>
<result>
<binding name="s">
<uri>http://example/book2
</binding>
...
</result>
...
</results>
</sparql>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="p">
<int name="<http://example.org/ns#price>">231</int>
<int name="<http://purl.org/dc/elements/1.1/creator>">1432</int>
<int name="<http://purl.org/dc/elements/1.1/title>">2212</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
The first question is: what does trigger that hybrid search? I would like to maintain the standard SPARQL endpoint functionality so a good compromise could be the following:
- if the query string contains only a q parameter then the plain SPARQL endpoint will execute the query. It will return a standard SPARQL-Result response;
- if the query string contains also other parameters (at the moment I considered only the facet, facet.field, rows and start parameters) then a hybrid search will be executed, therefore providing results in the mixed mode listed above.