Here's an overview of the Semantic Web standards RDF and SPARQL and a look at two real-world applications that have emerged from the Semantic Web concept.
In my previous IT Leadership article, I talked about the idea that content management considers not only the content but also its authors and audiences. When we think about audiences, we usually refer to people with different interests or intentions, such as researchers or customers; computers or other applications can be seen as a type of audience as well.
From the famous example cited by the World Wide Web Consortium (W3C), wouldn't it be useful if my calendar application could read and interpret my photographs' properties so it could place the photos into a timeline, or if it could understand my bank statements' properties so it could match the statements with my daily activities? This idea of machine accessibility to data was central to the original idea of the Semantic Web, as proposed by Tim Berners-Lee. Although the focus of the Semantic Web has evolved in the eight years since Berners-Lee's seminal article was published, the central concept of a Web that is composed of data linked together across applications and audiences (including machine audiences), and of rich properties that go beyond metadata to identify data characteristics and data connections, still applies.
I won't attempt to present a tutorial on basic Semantic Web concepts here; you can find that in a previous TechRepublic post and at the Semantic Web FAQ. Before I highlight two real-world applications that have emerged from this concept, I think it's important to provide an overview of Semantic Web standards Resource Description Framework (RDF) and SPARQL, which are essential to understanding the applications we'll review.
RDF is a widely-adopted proposed W3C standard that offers a model for describing data on the Web. RDF is specifically designed to enable machine-accessible descriptions of Web-based data so applications and query engines other than the one originally used to create the data can parse and interpret it and make connections that add context and meaning.
Using RDF, we can tell an RDF-capable query engine, such as SPARQL, that "there is a Person identified by http://www.w3.org/People/EM/contact#me, whose name is Eric Miller, whose email address is firstname.lastname@example.org, and whose title is Dr.". This classic example from W3C illustrates the basic structure of an RDF descriptor. The link, which looks like a typical Web URL but is referred to as a Uniform Resource Identifier (URI), tells the reader (human or machine) where to find the Web resource referenced, and the other elements tell us what it means. (See a graphic for the Eric Miller example.) RDF data descriptions are typically described in an XML format, but they don't have to be. Like the Dewey Decimal System, RDF provides a standard structure for accessing underlying data; unlike the Dewey Decimal System, RDF is expressly designed to enable computers and applications to interpret and connect that data in meaningful ways.
For a real-world example of the advantages of RDF, let's look at Google's Rich Snippets feature. According to Google, "Rich Snippets give users convenient summary information about their search results at a glance." When a user searches for, say, restaurants on a Google map, Google's search engine looks for specific RDF markup tags and includes the referenced information as summary data on the search result. Webmasters and developers who incorporate these new RDF-compliant tags into their data enable richer, more compelling results in their Google searches, and make this data available to future Semantic Web applications.SPARQL
SPARQL (or "sparkle," as it's informally called) is a query language like SQL; however, unlike SQL, which is typically associated with a specific implementation, application, and database, SPARQL is designed to use RDF datasets to query data resources all across the Web. Text, images, a video's audio files, RDF XML references -- all data that conforms to the RDF standard should be accessible to users and applications. As Jim Rapoza notes in his eWeek article about SPARQL, "I'm thinking that there are some pretty clever people out there who could build some very cool applications with that kind of power." And indeed there are.
DBpedia is a fascinating illustration of the most important elements of the Semantic Web. DBpedia isn't exactly an application, although it does have a search capability that is accessible to casual users; it's not exactly a database either, as it does not contain the actual data, but instead contains the RDF information that points to and categorizes data contained in Wikipedia.
Working in multiple languages, DBpedia extracts RDF descriptions from Wikipedia and then adds them to its "knowledge base" of URI's identified as people, places, films, video games, organizations, and music albums, for example. Using W3C's Simple Knowledge Organization System (SKOS) as a taxonomy, it categorizes this data to make it accessible to SPARQL queries. As an example of the utility of DBpedia, the BBC has commissioned a pilot project called Muddy Boots, which enhances BBC stories by allowing users to link through to more information about referenced people, places, and things. BBC has taken this concept a step further by integrating these Semantic Web ideas into its BBC Music service, which offers in-depth linked information about artists featured on BBC Radio. By following the thread from Wikipedia to DBpedia, and then to its applications like Muddy Boots and BBC Music, we get a taste of the enhanced Web experience possible through semantics.
For an in-depth description of the technology behind DBpedia, read this academic paper.Freebase
One of the more hyped applications in this arena is Freebase, a startup supported by venture capital firms including Goldman Sachs and Benchmark Capital. (Read the coverage about Freebase in The New York Times.) In a Huffington Post column, Esther Dyson describes Freebase as "a tool to represent the world in a way that can be understood by computers as well as by people...Rather than present information to humans so that they can figure out what to do with it, it represents information in a way that lets computers manipulate it." And, as we learned, this is the central mission of the entire concept of the Semantic Web.
One look at Freebase illiustrates how it differs from standard search engines. A Google search on the term evolution will present everything from the Wikipedia entry to a YouTube video of a commercial for Dove's evolution skin-care product, with no context and little order. The same query in Freebase will offer a structured list of different contexts for the search term, categorizing it as either a field of study, a topic, a journal, a book, a movie, or a musical group. Select one of these categories, and Freebase will present a clean, elegant page with a short description, links to relevant literature, a list of related topics, a list of people who have contributed to the maintenance of this topic, and a series of associated graphics. The richness of Freebase's search results is so compelling that it forms the basis of Microsoft's new Bing "decision engine," presenting the "infoboxes" that appear to Bing searchers and presenting galleries of associated graphics.
The application of Semantic Web concepts are exploding and have implications for users, Web developers, and IT managers; examples include Data.gov, the Obama White House's effort to use semantics to enhance citizen access to government data, and Open Calais, a project of Thomson Reuters that creates semantic metadata for documents submitted by users.
Compare Google to Freebase, and then think about your Web properties. The use of interlinked, metatagged data to create a Web of rich, accessible, open data that can be interpreted not only by humans but also by applications stretches the boundaries of our understanding about what can be achieved on the Web and is sure to impact every IT professionals' ideas about Web development.
An understanding of these concepts, and innovative thinking about its application in your organization, will be a differentiating factor as we move beyond the World Wide Web into the Semantic Web world.
Get leadership tips in your inbox TechRepublic's IT Leadership newsletter, delivered Tuesday and Thursday, features blogs, white papers, and other resources for IT managers and CIOs. You'll receive advice on staffing, morale, dealing with day-to-day challenges, and much more. Automatically sign up today!