Collaboration

An introduction to Tim Berners-Lee's Semantic Web

The need to search and interpret machine-understandable data on the Web is becoming a high priority in a variety of industries. This article discusses Tim Berners-Lee's vision of the Semantic Web and how it will help the Web take a giant leap forward and open up entirely new fields of opportunity.

"The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the Web. Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine process-able form"-Tim Berners-Lee, The Semantic Web Roadmap.

An introduction to Tim Berners-Lee's Semantic Web

For Tim Berners-Lee, who many recognize as the true inventor of the World Wide Web as we know it, the Semantic Web has been 15 years in the making.

What is the Semantic Web? The Semantic Web is the name of a long-term project started by W3C with the stated purpose of realizing the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration, and reuse of data across various applications (from the W3C Semantic Web Activity Statement). The Semantic Web is a Web-technology that lives on top of the existing Web by including machine-readable information in files without modifying the existing Web structure.

In its current format, raw HTML text and images contain meta-information that is readily understandable by a human, but has little to no meaning to computer programs. For instance, popular search engines can help you locate files containing specific words, but this content may not actually be what you're looking for. If the content matches the words you searched on, but pertains to a different topic than you had in mind, the result will not be what you intended. There is also no way for the search engine to relate to other related content a few steps down the virtual relationship path. The characters 95495 could mean a dryer belt, a zip-code, a street address, etc. Human language can efficiently operate when using the same term to mean somewhat different things, but automation does not.

In another example, let's say you were doing research on a CEO named Attilio Russo (fictitious). A standard HTML search will look for string occurrences (along with some fuzzy logic to find partial matches, etc.) of documents that contain Attilio Russo. In a semantic Web model, there would be semantic searches that look for documents on the Web with relationships to that data, that would then compile and organize the relationships and give you things like a list of previous companies Attilio worked for, the board of directors of those companies, companies those board members worked for, etc. This would allow a computer to form relationships from data on the Web in a way in which only humans can do currently.

The Semantic Web is designed to allow reasoning and inference capabilities to be added to the pure descriptions. In its simplest form, this includes stating facts such as ''a hex-head bolt is a type of machine bolt," but extends to the formation of complicated relationships. Features like this allow intelligent software to act on this descriptive information and follow logic paths based on them.

The two most important technologies for developing the Semantic Web are eXtensible Markup Language (XML) and the Resource Description Framework (RDF). XML allows content creators to label information in a meaningful way (i.e., <Dryer><Part>95405</Part></Dryer>). Programs can make use of these tags in sophisticated ways, but the program has to know what the content creator uses each tag for. In summary, XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean.

This leads us to RDF—the Resource Description Framework. RDF expresses the meaning of XML. The W3C developed this new logical language to facilitate interoperability of applications which generate and process machine-understandable representations of data resources on the Web. In RDF, a document makes assertions that particular things have properties (such as "is a brother of," "is the CEO of") with certain values. This structure turns out to be a natural way to describe the majority of data processed by machines. Within this structure, the subject and object are each identified by a Universal Resource Identifier (URI), similar to the concept of a link on a Web page. (URLs, Uniform Resource Locators, are the most common type of URI.)

Even with the above framework in place, two databases may use different identifiers for what is in fact the same concept. A program that wants to compare or combine information across the two databases has to know that these two terms mean the same thing. The program must have a way to discover such common meanings for whatever databases it encounters.

Ontologies provide the solution to this problem. In philosophy, ontology is a theory about the nature of existence, of what types of things exist; ontology as a discipline studies such theories. Artificial-intelligence and Web researchers have co-opted the term for their own jargon, and for them the term ontology refers to a document or file that formally defines the relations among terms.

The Semantic Web will advance the relational database model and overturn old ways of organizing information, according to Berners-Lee. Rather than listing information in tree structures, it will create a Web based on the relationships of people, places and things as they exist in the real world.

He expressed the belief that Semantic Web technology will advance the information revolution he began with the World Wide Web, changing everything from how users set up their online address books to how they pay their taxes.

There is a lot more to the concept of the Semantic Web than can be realistically included in one article. You'll find an in-depth knowledge base on the subject at http://www.w3.org/2001/sw/.

Editor's Picks