Clearing the confusion about XML databases

Don't be confused by XML databases. Get the details on this technology buzz, including the different types of databases and their characteristics.

When developers talk about XML databases, they usually mean two things: the database that stores the XML data and the DBMS that manipulates the XML database. Most major DBMS products allow you to integrate XML data into your application without changing your existing database. Let’s discuss the types of XML databases and explore the characteristics of each.

Native XML databases
A native XML database (NXD) can be simple or complex. I define a database as a collection of data that is persistent. Under this definition, an NXD would logically store an XML document. While the XML:DB initiative has more stringent requirements for an NXD, I think my definition is sufficient for this discussion.

The following qualifies as an NXD:
<?xml version=”1.0”?>
<meal mealName=”breakfast”>
<item itemName=“toast”  unit=”slice” quantity=“2” />
<item itemName=“bacon”  unit=”strip” quantity=“2” />

It's clearly a collection of data stored in an XML format. If it is stored in a flat file, it persists, and meets our definition of an NXD: a persistent collection of data.

An NXD uses familiar ways of storing the XML documents (e.g., flat files, relational or hierarchical databases, and object databases).

Flat files
A single flat file is the simplest model of an XML database. As a variation, you can store multiple XML documents in a directory hierarchy. To expand upon my previous example, take a look at the following model:


The directory, Diets, contains subdirectories and each subdirectory contains a number of XML flat files.

Relational databases
XML databases implemented in a relational database can be grouped into three models: coarse-grained, medium-grained, or fine-grained.

The coarse-grained model doesn’t fall far logically from the flat file model. You can use a relational database to store each XML document as a long text string. For example, the following table could store one XML document per row:
Create table diets
( dietName varchar2(30),
 xmlDocument varchar2(32000)

The fine-grained model maps each component of the XML document into the relational database. To move this flat model into a relational database, you must make two changes. You can no longer use the subdirectory name to indicate the name of the diet and you can no longer use the name of the file to give the day of the diet. The diet name was added as an attribute, and the day became an element. Listing A shows the Document Type Definition (DTD) for the diet XML document; Listing B shows the Data Definition Language (DDL) that maps the DTD to relational tables. In addition, the corresponding XML Schema is included in Listing C.

A medium-grained model falls between the fine- and coarse-grained models (see Listing B). Using this model, you don't have a separate item table; you store that information in the xml_items column in the meals table.

Object databases
You can also implement NXD using an object database. Some are based on the document object model (DOM), which allows a tight coupling of XML to the database. For example, once you provide the NXD with the diet DTD, you can start storing diets with no (or a minimum) of additional configuration.

Database management systems
Using a DBMS to manipulate XML is where the real fun comes in. You have several choices in a DBMS. You can create your own using existing open source applications such as eXist and Ozone, or purchase commercial products such as the Tamino XML Server.

Most major relational database vendors have enabled their databases to use XML. At a bare minimum, XML-enabled databases (XEDBs) should return the results of a query as an XML document. Some can also store XML data. For example, Oracle provides an XML parser, an XPath engine, an XSLT processor, an XML SQL utility, and more ways of working with XML.

So, what is the difference between an NXD that uses a relational database for storage and an XEDB that can retrieve and store XML data? An NXD is built around the concept of an XML document as a logical unit. The XEDB is not as tightly coupled. Furthermore, some expound that to qualify as an NXD, the DBMS should allow only standard XML methods of querying and storing data, such as XPath.

XML databases not necessarily a new concept
XML databases are new in the logical view of a database although they are not necessarily new in how the logical view is implemented. If you want to implement an XML database, you can use NXD, XEDB, or a hybrid of the two. However, if your company already has a substantial investment in an existing database, you should first research whether your DBMS vendor provides tools for using XML. No matter which form of XML database you use, it is important to understand the characteristics of each.




Editor's Picks