In the introductory article about SPARQL,
I mentioned that there are several query languages being developed for
accessing RDF data. RDF is a main standard for metadata defining and storage in
a Web 2.0 environment. However, most of the query languages are incomplete and fairly
complex. People often need something simple to start their trek along the
learning curve. Accessing RDF data is a simple thing when you use SquishQL – an RDF query language with SQL notation.

A Squish term can be treated as “SQL-ish”, the query syntax used is designed to resemble
the basic structure of SQL: we ask some database for
possible values for a selection of variables given some constraining
expression. In Squish, this constraining expression can be thought of as a list
of RDF statements where some parts of each statement have missing values (this
indicated by ‘?’ variables in place of URIs or string
literals).

It is not the only RDF query language with
SQL-based syntax, the Jena
RDF framework
had a similar query language called RDQL, but moved to SPARQL
later. Both SquishQL and RDQL were based on R.V.Guha’s rdfDB query language.

As for the future of SquishQL,
it seems it’s not going to be an industry standard since a more powerful and
complex language like SPARQL is supported in the modern full-featured Java RDF
framework (have a look at Jena, Joseki, or Sesame). However it still remains the simplest query
language for beginners in the RDF query world — like Pascal in the world of programming
languages.

Note: All of the examples contained in
this article are available in text format from the download version.

An initial look

This is an example of a typical SquishQL query (example1.txt):

Example 1

SELECT ?item, ?job, ?orghome, ?salary, ?currency

   WHERE (job::advertises ?item ?job)
         (rdf::type ?job wordnet::Job)
         (job::salary ?job ?salary)
         (job::currency ?job ?currency)
         (job::orgHomepage ?job ?orghome)

   USING job FOR http://ilrt.org/discovery/2000/11/rss-query/jobvocab.rdf#
         rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
         wordnet FOR http://xmlns.com/wordnet/1.6/

The answer to queries like this can be
represented as a tabular result set, where columns correspond to the variables
in the query (“?salary” etc), and rows
correspond to states of affairs represented in the RSS in which the variables
match values from the dataset. This is very similar to the ODBC/JDBC model
familiar from the relational database world. In addition, the result set can be
viewed as another RDF dataset, i.e. the data graph corresponding to all the
nodes and arcs implicated in the answering of the query.

For each row in the result set, there will
be a concrete value given for each named variable such as “?item” that is specified in the SELECT clause. The
variable itself is a placeholder rather than a specific Web resource. Some of
the properties of the resource the variable is ‘standing in for’ are specified
by the constraints in the WHERE clause.

Here is another simple query. First we
present the SQL-ish query, followed by a prose
translation (example2.txt):

Example 2

SELECT ?x, ?t, ?c, ?o

WHERE   (dc::title ?x, ?t)
        (dc::creator ?x, ?c)
        (eg::homePage ?c http://purl.org/net/eric/)
        (eg::worksFor ?c, ?o)

USING dc FOR http://purl.org/dc/1.1/
      eg FOR http://example.com/vocab/foaf/

This is what we’re trying to say with this
query:

“find me the
dc title (we’ll call it ‘t’) of any resource (we’ll call it ‘x’) that has a dc
creator ‘c’ with a homepage ‘http://purl.org/net/eric/’,
and tell me who they work for (‘o’)”.

The answer is (just as in the SQL world) a
table, with columns corresponding to the things we asked for, ie. ‘t’,’x’,’c’,’o’.
Each row will supply one set of values from the database that match the
constraints in the ‘WHERE’ clause of the query. Here’s a tabular representation
of a possible result set from our main example (results1.txt):

Results1

item      job              orghome                 salary currency
——— —————- ———————– —— ——–
job1.html job 1 title here http://www.ukoln.ac.uk/ 100000 USD
job2.html job 2 title here http://ilrt.org//       150000 EUR

How it works

In SquishQL,
there are two classes of constraints; patterns and filter expressions. Patterns
are generative, i.e. they create bindings, and the filters are restrictive,
i.e. they remove possibilities. SquishQL separates
these into the WHERE clause (generative) and the AND clause (restrictive). Some
query systems have followed the tradition of having predicate first. SquishQL instead mimics the N-Triples syntax and specifies
triple patterns as subject-predicate-object.

In SQL, a database is a closed world; the
FROM clause identifies the tables in the database and the WHERE clause
identifies constraints and can be extended with AND. By analogy, the Semantic Web
is the database and the FROM clause identifies the RDF models. Variables are
introduced by leading “?” and URIs are
quoted with “<>”, unquoted URIs can
be used where there is no ambiguity.

These are the main elements of a query:

  • SELECT Clause: Identifies the variables to be returned to
    the application. If not all the variables are needed by the application,
    then specifying the required results can reduce the amount of memory
    needed for the results set as well as providing information to a query
    optimizer.
  • FROM Clause: The FROM clause specifies the model by URI.
  • WHERE Clause: Specifies the graph pattern as the
    conjunction of the list of triple patterns.
  • AND Clause: Specifies the Boolean expressions over
    values of URIs and literals, including
    arithmetic comparisons, and Boolean expressions, including disjunction and
    negation as well.
  • USING Clause: A way to shorten the length of URIs. As SquishQL is likely
    to be written by people, this mechanism helps make for an easier to
    understand syntax. This is not a namespace mechanism; instead, it is
    simply an abbreviation mechanism for long URIs
    by defining a string prefix.

The RDF specification defines the form of
containers and of reification. There is no explicit syntax for these in SquishQL. As shown in the examples, this does not affect
retrieving data from containers, but the query can become cumbersome.
Similarly, with reification, the lack of syntactic support can make expressing
some queries awkward.

This is how the contents of an RDF bag can
be extracted (example3.txt):

Example 3

SELECT ?y
WHERE (<http://somewhere.com/aBag>, ?x, ?y)
AND ! ( ?x eq <rsyn:type> && ?y eq <rsyn:Bag>)
USING
  rsyn FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#

How it works with Inkling

Inkling is a
Java implementation of SquishQL created to be API and
database-independent for testing the usefulness of SquishQL
for comparatively small-scale projects. The aim was to have a query engine that
could be used with almost any RDF database implementation written in Java, and
which could be used for experimenting with the SquishQL
query language.

For Inkling to be able to talk to an RDF
database or service, the service just has to implement an extremely basic
interface consisting of a single method. This method is a three-place search
method:

queryDatabase(subject, predicate, object)

Where any argument can be null, which was
the lowest common denominator of methods supported by different examined APIs.

Inkling also uses the JDBC interfaces to
make SquishQL queries. This enables the
implementation to be fairly independent of the database to be searched, and
also means that Java programmers will be familiar with the means of accessing
the queries.

The second implementation of SquishQL was RDQL, part of the Jena RDF toolkit, which
combines query with manipulation of the RDF graph at a fine-grained level
through the Jena RDF API. RDQL is now obsolete and replaced by SPARQL. The
third implementation of SquishQL is RDFStore, which implements SquishQL
to query RDF repositories directly from the Perl language.

Starter pack

It’s already evident that SquishQL will not be the mainstream RDF query language in
the industry, however due to its simplicity it can be used a starter pack for
an RDF beginner.