Collaboration

Automatically generate a Web site map using RDF and Jena

The foundation for metadata interoperability is the Resource Description Framework. Using Jena and RDF, you can automate the generation of a Web-standard site map and increase your site's ability to communicate with other applications.


The Resource Description Framework (RDF)—developed by the W3C—provides the foundation for metadata interoperability across different resource description communities. One of the major obstacles facing the resource description community is the multiplicity of incompatible standards for metadata syntax and schema definition languages. In this article, I will introduce you to a way to develop a Web-standard site map for use in your site engine via the Java RDF framework.

About RDF
RDF is a language for representing information about resources on the World Wide Web. In particular, it is intended to be a means for representing metadata about Web resources, such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource.

However, by generalizing the concept of a Web resource, RDF can also be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web. RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning.

Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created.

RDF is based on the idea of identifying items using Uniform Resource Identifiers (URIs) and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources and their properties and values.

RDF also provides an XML-based syntax (called RDF/XML) for recording and exchanging these graphs. Like HTML, RDF/XML is machine-readable and, using URIs, can link pieces of information across the Web. The result is that in addition to describing such things as Web pages, we can also describe cars, businesses, people, news events, etc. In addition, RDF properties themselves have URIs to precisely identify the kind of relationship that exists between the linked items.

Site maps usage
We will use the term sitemap to refer to hiearchically-oriented representations of Web content such as what might appear in a table of contents for a Web site, a bookmarking system in a browser, a push channel, or a tree control user interface widget. Many Web sites offer a page presenting users with a hierarchical (and possibly interactive and incrementally displayed) overview of a large, structured database or document collection. But because RDF is able to solve the complex problem of managing information across multiple yet incompatible file formats, you'll be able to automatically generate a site map in software that uses RDF. Let's examine how you can use such a site map structure within your Web site engine.

Deploying Java to RDF
Jena is a Java API that can be used to create and manipulate RDF graphs. Jena has object classes to represent graphs, resources, properties, and literals; a graph is called a model and is represented by the Model interface.

The following is a site map of a very simple Web site that contains only one page, index.xml:
      <?xml version="1.0" ?>

      <!—
       RDF Site Map
        site.rdf
         Summer 2003
         —>

<rdf:RDF
      xml:lang="en"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:xite="http://rdf.sigent.ru/xite/elements/1.0/"
      xmlns:cms="http://rdf.sigent.ru/cms/elements/1.0/"
      xmlns:site="http://rdf.sigent.ru/site/elements/1.0/"
>


<rdf:Descriptionrdf:about="http://www.website.com/">

    <dc:title>Website.com</dc:title>
    <dc:creator>WebConsults * http://www.wc.com</dc:creator>
    <dc:contributor></dc:contributor>
    <dc:subject></dc:subject>
    <dc:date>2003-07-02</dc:date>
    <dc:language>English</dc:language>
    <dc:publisher>WebConsults</dc:publisher>
    <dc:rights>2003 Website</dc:rights>


    <xite:urlScheme>http</xite:urlScheme>


    <xite:page>

      <rdf:Descriptionrdf:about="http://www.website.com/index.xml">
      <site:title></site:title>
      <site:alias>/</site:alias>
          <xite:CMS>
                  <rdf:Description>
                           <cms:method>select</cms:method>
                           <cms:map>news</cms:map>
                           <cms:alias>news-list</cms:alias>
                  </rdf:Description>
          </xite:CMS>
        </rdf:Description>
      </xite:page>

    </rdf:Description>

    </rdf:RDF>


There are several namespaces declared in the main document element, rdf:RDF. A dc namespace stands for Dublin Core Element Set, a standard for cross-domain information resource description. Namespaces xite, site, and cms are Web-application-specific and will be used by the Web site engine through a Java RDF parser. The following contains a Java class that reads our site map and distills actual data for the site engine:
package test;

import com.hp.hpl.mesa.rdf.jena.mem.ModelMem;
import com.hp.hpl.mesa.rdf.jena.model.*;
import com.hp.hpl.mesa.rdf.jena.common.*;
import com.hp.hpl.mesa.rdf.jena.vocabulary.*;
import java.util.*;

public class SitemapReader {

  // String constants defining full XML element names with namespaces
   public static final String URL_SCHEME =
"http://rdf.sigent.ru/xite/elements/1.0/urlScheme";
   public static final String METHOD = "http://rdf.sigent.ru/cms/
elements/1.0/method";
   public static final String MAP = "http://rdf.sigent.ru/cms/
elements/1.0/map";
    public static final String ALIAS = "http://rdf.sigent.ru/cms/
elements/1.0/alias";

     public String urlScheme = null;

     public SitemapReader() {
   try {
     // Create a new empty model
            Model model = new ModelMem();
           // Read an RDF file into memory model
           model.read(new FileReader("Listing-A.rdf"), "");

            // Predicate for filtering urlScheme element
            Property predicate = new PropertyImpl(URL_SCHEME);
            Selector selector = new SelectorImpl(null, predicate,
(RDFNode) null);

           // Filtering
           StmtIteratoriter = model.listStatements(selector);
            if (iter.hasNext()) {
           urlScheme = iter.next().getObject().toString();
            } else {
                    throw new Exception("No default URL scheme (urlScheme)
         defined in sitemap. Shutdown.");
              }
                      } catch (Exception e) {
                      e.printStackTrace();
                   System.err.println("RDF sitemap initialization failed.");
   }
  }
}


For example, it can be information about CMS modules needed at a particular page. It begins with some constant definitions and then creates an empty graph or model. ModelMem is a class that implements the Model interface and holds all its data in main memory. Jena contains other implementations of the Model interface (e.g., one that stores its data in a Berkley DB database, and another that uses a relational database). After the RDF model is read into memory by the read() method, we can begin to navigate it.

Each arc in an RDF graph is called a statement, and each statement asserts a fact about a resource. A statement has three parts: The subject is the resource from which the arc leaves, the predicate is the property that labels the arc, and the object is the resource or literal pointed to by the arc. A statement is sometimes called a triple, because of its three parts.

An RDF graph is represented by a set of statements. The Jena model interface defines a listStatements() method, which returns an iterator over all the statements in a graph. Each time the next method of the iterator is called, it returns a Jena object of the type Statement. The Statement interface provides accessor methods to the subject, predicate, and object of a statement.

Using a primitive query method, model.listStatements(Selector s) will return an iterator over all the statements in the model selected. The selector interface is designed to be extensible, but for now, there is only one implementation of it, the class SelectorImpl. In our case,SelectorImpl(null, predicate, null) will select all of the statements with predicate as their predicate, whatever the subject or object.

Deploying to a hierarchical semantic-loaded environment
This is only a little part of all possibilities available in the Jena package. One of its excellent features is that it can write (serialize) RDF models back after modifying them. And as long as it is an open-source and standardized solution, it can be easily built in to data-driven applications that need to exchange hierarchical data with some semantic value.

Editor's Picks