“The Web was designed as an information space,
with the goal that it should be useful not only for human-human communication,
but also that machines would be able to participate and help. One of the major
obstacles to this has been the fact that most information on the Web is
designed for human consumption, and even if it was derived from a database with
well defined meanings (in at least some terms) for its columns, that the
structure of the data is not evident to a robot browsing the Web. Leaving aside
the artificial intelligence problem of training machines to behave like people,
the Semantic Web approach instead develops languages for expressing information
in a machine process-able form”-Tim Berners-Lee, The Semantic Web Roadmap.
An introduction to Tim Berners-Lee’s Semantic Web
For Tim Berners-Lee, who many recognize as the true inventor
of the World Wide Web as we know it, the Semantic Web has been 15 years in the
making.
What is the Semantic Web? The Semantic Web is the name of a
long-term project started by W3C with the stated purpose of realizing the idea
of having data on the Web defined and linked in a way that it can be used by
machines not just for display purposes, but for automation, integration, and
reuse of data across various applications (from the W3C Semantic Web Activity Statement).
The Semantic Web is a Web-technology
that lives on top of the existing Web by including machine-readable
information in files without modifying the existing Web structure.
In its current format, raw HTML text and images contain
meta-information that is readily understandable by a human, but has little to
no meaning to computer programs. For instance, popular search engines can help
you locate files containing specific words, but this content may not actually
be what you’re looking for. If the content matches the words you searched on,
but pertains to a different topic than you had in mind, the result will not be
what you intended. There is also no way for the search engine to relate to
other related content a few steps down the virtual relationship path. The
characters 95495 could mean a dryer
belt, a zip-code, a street address, etc. Human language can efficiently operate
when using the same term to mean somewhat different things, but automation does
not.
In
another example, let’s say you were doing research on a CEO named Attilio Russo
(fictitious). A standard HTML search will look for string occurrences (along
with some fuzzy logic to find partial matches, etc.) of documents that contain
Attilio Russo. In a semantic Web model, there would be semantic searches that
look for documents on the Web with relationships to that data, that would then compile
and organize the relationships and give you things like a list of previous
companies Attilio worked for, the board of directors of those companies,
companies those board members worked for, etc. This would allow a computer to
form relationships from data on the Web in a way in which only humans can do
currently.
The
Semantic Web is designed to allow
reasoning and inference capabilities to be added to the pure
descriptions. In its simplest form, this includes stating facts such as ”a
hex-head bolt is a type of machine bolt,” but extends to the formation of
complicated relationships. Features like this allow intelligent software to act
on this descriptive information and follow logic paths based on them.
The
two most important technologies for developing the Semantic Web are eXtensible
Markup Language (XML) and the Resource Description Framework (RDF). XML allows content
creators to label information in a meaningful way (i.e.,
<Dryer><Part>95405</Part></Dryer>). Programs can make
use of these tags in sophisticated ways, but the program has to know what the
content creator uses each tag for. In summary, XML allows users to add
arbitrary structure to their documents but says nothing about what the
structures mean.
This
leads us to RDF—the Resource Description Framework. RDF expresses the meaning
of XML. The W3C developed this new logical language to facilitate
interoperability of applications which generate and process
machine-understandable representations of data resources on the Web. In RDF, a
document makes assertions that particular things have properties (such as
“is a brother of,” “is the CEO of”) with certain values. This
structure turns out to be a natural way to describe the majority of data
processed by machines. Within this structure, the subject and object are each
identified by a Universal Resource Identifier (URI), similar to the concept of
a link on a Web page. (URLs, Uniform Resource Locators, are the most common
type of URI.)
Even with the above framework in place, two databases may
use different identifiers for what is in fact the same concept. A program that
wants to compare or combine information across the two databases has to know
that these two terms mean the same thing. The program must have a way to
discover such common meanings for whatever databases it encounters.
Ontologies provide the solution to this problem. In
philosophy, ontology is a theory
about the nature of existence, of what types of things exist; ontology as a
discipline studies such theories. Artificial-intelligence and Web researchers
have co-opted the term for their own jargon, and for them the term ontology refers
to a document or file that formally defines the relations among terms.
The
Semantic Web will advance the relational database model and overturn old ways
of organizing information, according to Berners-Lee. Rather than listing
information in tree structures, it will create a Web based on the relationships
of people, places and things as they exist in the real world.
He
expressed the belief that Semantic Web technology will advance the information
revolution he began with the World Wide Web, changing everything from how users
set up their online address books to how they pay their taxes.
There is a lot more to the concept of the Semantic Web than
can be realistically included in one article. You’ll find an in-depth knowledge
base on the subject at http://www.w3.org/2001/sw/.