There are many debates on the internet about
relationships between Resource Description Framework (RDF), Topic
Maps
and some ontology expressing languages. Some fuel to the fire has been
added with the introduction of other ontology languages such as OWL and SKOS. The W3C has
made an attempt to establish standard guidelines for RDF/Topic Maps
interoperability by consolidating the existing proposals of integrating RDF and
Topic Maps data.

In this article I’ll try to analyze the development
background of both standards, and give you an overview of five different
relationship proposals. These five proposals have been chosen as being
sufficiently complete and well-documented to be suitable for detailed
examination.

The primary goal of W3C was to achieve interoperability
between RDF and Topic Maps at the data level. This means that it should
be possible to translate data from one form to the other without unacceptable
loss of information or corruption of the semantics. It should also be possible
to query the results of a translation in terms of the target model and it
should be possible to share vocabularies across the two paradigms.

For those readers who are not familiar enough with
Topic Maps, a good introduction can be found in Steve Pepper’s famous document:
The
TAO of Topic Maps, Finding the way in the age of infoglut
“. It is also
supposed that reader is very familiar with RDF and its triplet concepts, as
well as with Semantic Web concepts.

Introduction

The Resource Description
Framework (RDF) is a model developed by the W3C for representing information
about resources in the World Wide Web. Topic Maps is a standard for knowledge
integration developed by the ISO. The two specifications were developed in
parallel during the late 1990’s within their separate organizations for what at
first appeared to be very different purposes. However, it appears that they
have a lot in common.

A number of attempts
have been made to uncover the synergies between RDF and Topic Maps and to find
ways of achieving interoperability at the data level. The goal of W3C now is to
provide guidelines for users who want to combine the W3C’s RDF/OWL family of
specifications and the ISO’s family of Topic Maps standards. This article is a
survey of existing approaches and an analysis of their strengths and
weaknesses. A W3C Recommendation with guidelines on transforming is yet to be published.

So what are the proposals?

I have selected five
different proposals for analysis. They will be referred to by the names of
their authors or, in the case of multiple authors, by the name of the
organization to which the authors are affiliated. Each proposal builds upon and
references previous work and they are characterized here in terms of the
translation directions that they cover: i.e., RDF to Topic Maps (RDF2TM), and
Topic Maps to RDF (TM2RDF), respectively:

A word on the XTM standard also
has to be said. TM is short for
Topic Maps, the name of the standard, the paradigm and (lower-cased) the
artifacts themselves. XTM is the standard interchange syntax (XML Topic Maps).
This is clarified when mentioned XTM.

All
the existing approaches fall into two distinct categories that Moore originally
termed “modeling the model” and “mapping the model”. These
might be more appropriately termed “object mappings” and
“semantic mappings” respectively. The basic difference between the
two approaches can be summed up as follows:

  • Object mappings use the
    low-level building blocks of one language to describe the object model
    of the other. For example, assuming for now that the structure of a simple
    binary associations data model is a quintuple, consisting of one (a)ssociation, two (r)oles, and
    two role (p)layers (p-r-a-r-p), that association would be represented
    using an object mapping as four statements that relate five resources.
  • Semantic mappings start from
    higher level concepts that carry the semantics of each model and attempt
    to find equivalences between them. A binary association in Topic Maps
    would be seen to represent the same kind of “thing” that is
    often represented by an RDF statement (i.e., a relationship between two
    entities) and would therefore be represented using a single RDF statement.
    Where no direct semantic equivalent can be found, the missing semantics
    are defined using the facilities available in one of the two paradigms,
    i.e., classes, properties, or published subjects.

The Moore proposal

It was the first
proposal to address the issue of interoperability between RDF and Topic Maps.
Having presented the two models, Moore introduces the distinction between what
he calls “mapping the model” and “modeling the model”. The
key difference is that the former is “semantic”, whereas the latter
“uses each standard as a tool for describing other models”. Let’s
have a look what this means in fact.

Moore’s RDF2TM object
mapping approach is based on defining PSIs (PSI is a
Published Subject Identifier, a type of topic which models a single term in a thesaurus)
for every RDF construct in his model (i.e., resource, statement, property,
subject, object, identity, literal, and model) and expressing RDF statements as
ternary associations of type rdf-statement using the role types rdf-subject, rdf-property and rdf-object. This raises issues with the handling of literals
(since role players in associations cannot be strings) to which no solution is
proposed.

The TM2RDF object
mapping approach is based on defining RDF properties for each TM construct as
follows: topic, topicassoc, instanceof, topicassocmember, roleplayingtopic, roledefiningtopic, topicoccur, topicname, topicnamevalue, scopeset, subjindicatorref, resourceref.

Moore’s object mapping
approach is reasonably complete, whereas his semantic mapping approach is just
a sketch that focuses on RDF statements and associations. Neither approach is
reversible. In the case of the object mapping approach, the assumption is that
one is working in one domain or the other, but not in both.

In the case of the
semantic mapping approach, the fact that a statement maps to a single
association whereas an association maps to two statements shows that
translations cannot be reversed. Semantic mappings are
shown to be superior to object mappings in terms of naturalness. The latter
yields unnatural results in both directions. Whatever the direction, a
“natural” source document leads to an “unnatural” result
and achieving a “natural” result is only possible if the starting
point is “unnatural”. In the object mapping example given in the Moore’s
proposal
, a simple binary association translates to 22 RDF statements.

The Stanford proposal

The main idea is to make
possible to query Topic Maps using an “RDF-aware infrastructure”.
This proposal is thus TM2RDF only. Reference is made to the layered integration
model of data interoperability which separates the data integration problem
into three quasi-independent layers: the syntax layer, the object layer, and
the semantic layer. The idea is to build an RDF representation of the topic map
on the object layer and then perform a “bijective
graph transformation” such that the topic map can be viewed as RDF. Ignoring the syntax layer means that the approach will work with
both the SGML and the XML serialization syntaxes of Topic Maps.

It ignores semantic
layer so that all information, according to the authors, is preserved. Instead
of defining their own model for Topic Maps, authors use PMTM4, the Processing
Model for Topic Maps, proposed by Newcomb and Biezunski
(“Topicmaps.net’s Processing Model for XTM 1.0).
In short, PMTM4 is a graph model consisting of three node types (for topics,
associations, and scopes), and four arc types: associationMember (aM), associationScope
(aS), associationTemplate (aT), and scopeComponent (sC).

Having constructed an
RDF graph from the topic map, authors show how it can be queried, together with
native RDF data, by a single query expressed in a special
logic syntax. The query in the following example uses the RDF-encoded topic map
to find all countries that have petroleum as a natural resource and then
extracts links to DMOZ Travel_and_Tourism pages for
those countries from the RDF-encoded Open Directory (See example1.txt):

Example 1

FORALL pages <- Country, DMOZCountry Y,X, Z
    Y[tms:roleLabel->country;rdf:object->Country]
        @CIA_WORLD_FACTBOOK and
    X[tms:roleLabel->natural-resource;
      rdf:object->petroleum;
      rdf:subject->Z[tms:associationMember->Country]
        @CIA_WORLD_FACTBOOK]
        @CIA_WORLD_FACTBOOK and
    Country[mapsTo->DMOZCountry] and
    DMOZCountry[Travel_and_Tourism ->dmozpage[links->pages]]
        @DMOZ.

The Stanford approach is
complete with respect to PMTM4, but the latter is not a complete model for
Topic Maps since is does not handle URIs and strings.
The Stanford proposal itself is therefore not complete. The proposal does not
score well in terms of naturalness since it requires upwards of 20 statements
to represent information that would naturally be modeled using two statements
in RDF.

The Ogievetsky
proposal

In this proposal, the
author describes both a method for transforming topic maps expressed in XTM
syntax to RDF and the author’s XSLT-based implementation of this approach in
the XTM2RDF Translator. Transformations are described in terms of the
processing of XTM elements and the approach is thus very syntax-oriented. The
resulting RDF conforms to a vocabulary (called RTM) which consists of 11
classes and 17 properties defined partly in terms of XTM itself and partly in
terms of discussed earlier PMTM4, the “processing model” proposed by
Newcomb and Biezunski and described in the preceding
section.

The classes and
properties defined by the RTM vocabulary are:

  • rdfs:Class: t-node,
    topic, scope, member, association, basename, variantname, occurrence, class-subclass,
    class-instance, templaterpc.
  • rdf:Property: association-role,
    validIn, indicatedBy, constitutedBy, name, templatedBy,
    role-topic, role-basename, role-variantname, role-occurrence, role-superclass,
    role-subclass, role-class, role-instance, role-template, role-role, role-rpc
    .

The mapping is pretty
simple: each <topic> element results in the creation of an RDF statement
of type rtm:topic.
The topic’s subject locator (if any) becomes the URI of the subject of the statement;
otherwise a blank node is created. Subject identifiers (if any) result in
properties of type rtm:indicatedBy.

Associations are
represented as blank nodes whose type corresponds to the association type. In
addition, for each role in the association there is one statement whose
property corresponds to the role type (e.g. ns1:composer and ns1:work in the example below); its value is a node of type rtm:member that references the role player. Referencing is done
through an rtm:indicatedBy property when the role player has a subject identifier and an rtm:constitutedBy property when the role player has a subject locator.
(The text does not state what form the reference takes when the role player has
neither.)

The following example
shows how the association between Tosca and Puccini is represented in RDF/XML
in “third RDF basic abbreviated form” (See example2.txt):

Example 2

<ns1:composed-by>
  <ns1:composer>
    <rtm:member>
      <rtm:indicatedByrdf:resource=”http://en.wikipedia.org/wiki/Puccini” />
    </rtm:member>
  </ns1:composer>
  <ns1:work>
    <rtm:member>
      <rtm:indicatedByrdf:resource=”http://psi.ontopia.net/opera/#tosca” />
    </rtm:member>
  </ns1:work>
</ns1:composed-by>

There is a very obvious
similarity between the syntax shown above and XTM, which could indicate that
the desire to output readable RDF/XML syntax (and perhaps the exigencies of
XSLT-based processing) have influenced the form of RDF chosen for the target
model.

String values for names
and internal occurrences are represented as the values of rtm:name
properties of member nodes. The following example shows the base name of the
composer Puccini as output by the xtm2rdf.xsl XSLT stylesheet (See example3.txt). A blank node represents
the topic-basename relationship. Syntactically, the rtm:baseName
construct has exactly the same “shape” as the association shown
above:

Example 3

<rtm:baseNamerdf:ID=”XSLTbaseName122124120120″>
  <rtm:role-topic>
    <rtm:member>
      <rtm:indicatedByrdf:resource=”#puccini” />
    </rtm:member>
  </rtm:role-topic>
  <rtm:role-name>
    <rtm:member>
      <rtm:name>Giacomo Puccini</rtm:name>
    </rtm:member>
  </rtm:role-name>
</rtm:baseName>

As with binary
associations, seven RDF statements are required to represent a single topic
name characteristic that would naturally be modeled using a single statement in
RDF.

The author shows also
how such “RDF Topic Maps” can be queried (using the RDF query
language SquishQL and constrained (using
DAML+OIL). The following sample query (See example4.txt)
shows how to find all topics that have names in the scope “taxon”:

Example 4

SELECT ?topic, ?name
FROM  http://www.cogx.com/xtm2rdf/seacr.rtm#
WHERE
  (rdf::type ?a ?rtm::basename)
  (rtm::role-topic ?a ?m1) (rtm::indicatedBy ?m1 ?topic)
  (rtm::role-name ?a ?m2)(rtm::name ?m2 ?name)
  (rtm::validIn ?a ?s)(rtm::indicatedBy ?s this::taxon)
USING
  rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
  rtm FOR http://www.cogx.com/xtm2rdf/rtm.rdf#
  this FOR  http://www.cogx.com/xtm2rdf/seacr.rtm#

The proposal appears to
be fairly complete in that it covers more-or-less every aspect of XTM syntax
(which requires extending the underlying PMTM4 model in order to cater for
identifiers). The proposal requires seven statements to represent information
content that would naturally be modeled using one statement in RDF and thus
rates very low in terms of naturalness. Translating the Topic
Maps test case results in an RDF document containing 125 statements.

The Garshol
proposal

This proposal was
originally presented as part of a comparative analysis of the RDF and Topic
Maps models. The analysis was further developed (and extended to partially
address OWL). The approach has been implemented by the author in the Ontopia Knowledge Suite.

The author starts by
comparing RDF and Topic Maps through an examination of concepts that are
fundamental to both paradigms: “symbols and things”,
“assertions”, “identity”, “reification”,
“qualification”, and “types and subtypes”. For each
concept, Garshol shows how they are expressed in each
paradigm and draws out the similarities and differences.

According to Garshol, RDF and Topic Maps are both “identity-based
technologies”; that is, the key concept in both is symbols
representing identifiable things about which assertions can be
made. In Topic Maps, “things” are called “subjects”; in RDF
they are called “resources” and, despite different definitions, they
are essentially the same concept. Subjects are represented by topics; resources
are represented by RDF nodes (or “nodes” for short). According to Garshol, the correspondence between “topic” and
“node” is close but not exact.

Assertions express relationships
between things and take the form of “topic characteristics” in Topic
Maps and “statements” in RDF. A topic characteristic can be a name,
an occurrence, or an association. An RDF statement can thus in theory be mapped
to any one of these three kinds of construct. Special attention is paid to associations
since these can be of any variety, whereas all RDF statements are binary. A
binary association maps fairly well to an RDF statement, but a non-binary
association does not.

The
concept of types and subtypes, on the other hand, is regarded as being
identical in Topic Maps and RDF (except for the fact that the subClassOf
property is part of RDF Schema rather than RDF itself).

The
author considered object-mapping approaches described in previous proposals as
heavy-weight and rather awkward to work with. As an alternative, Garshol proposes to use vocabulary-specific mappings
underpinned by a generic mapping. Statements should in general be mapped to
names, occurrences or associations since this provides the most
“natural” results. However, it is not possible to know which of these
is most appropriate for any given statement without an understanding of the
semantics of the property in question — hence the need for vocabulary-specific
mappings.

For example, the
RDF statement:

<http://example.com/X>
  <http://example.com/Y>
  “foo” .

Could
be mapped in Topic Maps to either a name or an internal occurrence (since the
object is a literal).Similarly, the statement:

<http://example.com/X>
  <http://example.com/W>
         <http://example.com/Z> .

could be mapped to either an association or an external
occurrence (since the object is a resource). An optimal semantic translation
cannot be performed without knowledge of the semantics of the properties Y and W.

For RDF2TM mapping, the
solution is to provide additional mapping information. This is done using an
RDF vocabulary called RTM which is used to annotate RDF documents (or their
schemas) and thus guide the translation process. The RTM vocabulary is used for
translating from RDF to Topic Maps and consists of the following RDF properties:
maps-to,
type,
in-scope,
subject-role,
object-role.

For TM2RDF additional
information is required in order for optimal and/or predictable results to be
achieved. As with the RDF2TM translation, the implementations provide some
level of defaulting. Both subject identifiers and subject locators are
automatically mapped to resource URIs. In addition,
associations can be exported to RDF in the absence of mapping information about
roles; in this case the choice of subject and object for the resulting statement
is arbitrary.

As currently specified
the Garshol proposal provides an almost
complete solution and the author himself identifies most of the respects in
which it is incomplete. Those which are not mentioned include containers,
collections, XML literals and typed literals. A high degree of reversibility
and round-tripping is achievable, provided appropriate reverse mappings are
generated during the translation. An issue exists with subject locators that
end up as subject identifiers when round-tripping from Topic Maps to RDF and
back to Topic Maps.

The Unibo proposal

The
authors of the Unibo proposal clearly prefer Garshol’s approach rather than those previously described because
it produces much more “readable” results and which is similar to
their own. The main difference is that Garshol does
not utilize the “standard RDF and RDFS predicates” and thus always
requires a mapping to be specified.

Like
earlier authors, Ciancarini et al recognize that
there are two fundamental approaches to tackling the problem of translation,
corresponding to what this survey calls object mapping and semantic mapping.
The first of these is seen to be problematic in that “the converted
document is necessarily very different from the one that would have been
written directly in the destination language, and hardly readable.” The
problem with the second one is that it is “not always possible” to
identify semantic equivalences, and that doing so often requires a case-by-case
approach and thus has no general usefulness.

The
authors therefore consider a hybrid approach to be the optimal solution and
their implementation in the Meta Converter combines a generic mapping, which
tries to stay as close as possible to the original semantics, with the ability
to define specific mappings using an XML vocabulary.

The Unibo
proposal is fairly complete but some features, e.g., language tags and data
typing in RDF, and reification of roles and topic maps, are not covered
explicitly. The proposal permits some degree of reversibility, but the result
of a roundtrip may not always be the same as the starting point. For example,
using the generic mappings, most RDF statements would be translated to typed
associations with untyped roles, each of which would
result in several statements when translated back to RDF.

The
approach produces somewhat natural results in both directions provided mapping
information is supplied. Generic translations are far less satisfactory, with a
single binary association resulting in nine RDF statements.

Resolving remaining issues

Among the several
possible criteria for evaluating these proposals, two — completeness and
naturalness — have been selected as the most relevant and appropriate for
evaluating the qualities and limitations of each proposal.

Completeness — is defined as the extent to which any semantic
structure in the source model is correctly (i.e., without losing or adding
information) translated into the destination model, provides a clear indication
of the semantic power of each translation approach.

Naturalness — is defined as the extent to which a translated model
resembles in structure and content an equivalent model expressed directly in
the target paradigm, provides an indication of the level of integration that
each approach offers for the translated result to merge, and interacts with
other models expressed in the same paradigm.

The analysis of the
proposals identified two main approaches towards translation, which we dubbed
“object mapping” (providing a translation of every structural
component of the source paradigm) and “semantic mapping” (providing a
structure corresponding to every conceptual structure of the source model).

The analysis of the
options and solutions provided in the literature, therefore, clearly shows the
advantages of semantic mapping, but at the same time lists the issues that need
to be addressed and solved in any future translation approach. However, now
that both RDF and Topic Maps have formal data models, and with the help of RDF
Schema and OWL, it seems likely that most, if not all, of the issues we have
listed here can be resolved without resorting to the restricted
interoperability offered by object mapping.