XML has evolved into a viable alternative for representing data. As more applications use XML, the big question becomes how to combine XML with relational databases. Let’s dive deeper into the issues involved in combining XML with databases and look at how all that data can be stored and queried.
XML database types
There are two categories to consider when deciding which type of XML database fits a particular application:
- Data-centric: Products that actually store the data or content in non-XML format
- Document-centric: Products that store complete XML documents in relational tables or on disk in file structures
Data-centric databases store data separate from the XML schema, usually just transforming the original content into relational tables. These products are referred to as XML-enabled databases. If an XML document is needed, the data stored in relational tables can be queried and an XML document created. Most major relational databases (Sybase, Oracle, and SQL Server) fall into this category.
Document-centric databases store the entire XML document in a relational, text, or proprietary format. These are called native XML databases. A couple of popular native XML databases are the Xindice (zeen-dee-chay) open source product from Apache and eXist, which is also open source.
Querying, XML style
Support for XPath or XML queries is a primary feature in XML databases. The major relational database vendors provide XPath support, while native XML databases provide support for querying with XPath, usually via the XML:DB API. Finding developers who understand XPath, much less database administrators, is a problem. However, for a simply structured database or hierarchical database, or for XML documents, XPath is more efficient than SQL. Unfortunately, the necessary string and date functions to manipulate the results don’t exist in SQL (String and Date functions are used in the XSLT code). For more complete queries, XML Query is more like SQL but is less supported.
For example, the SQL query below can’t be represented with XPath:
SELECT left(name,3) from employees
However, the following SQL:
SELECT * FROM employees WHERE left(name,3) = ‘hoo’
can be queried with something like:
//employees/name[starts-with(last,’hoo’)]
and with this XML Query:
for $t in document(“employeeList.xml”)//(employee)/name where contains($t/text(), “hoo”) return $t }
Bear in mind that XML:DB and XPath are more efficient at querying XML documents, not relational data structures.
XML support in relational databases
XML-enabled applications support creating information as XML, and reading XML is an important feature. Large vendors like Microsoft, Oracle, and IBM (and more) have succeeded in transforming relational data into XML and have XPath or XML Query implementations. Each platform also offers tools to compliment its database offerings. Programming is often required to maintain XML content, and SDKs (Software Development Kits) are available.
Most systems (excluding XML databases) lack methods to directly import or read XML documents. You can program SQL Server 2000 with stored procedures to import XML content directly into one or more tables. DB2 and Oracle have similar functionality.
XML-centric applications, such as BizTalk Server and XML Spy, do a much better job at reading XML documents since they act as a bridge between the XML and database. However, these programs require a serious commitment and substantial investment. One reason that these applications import XML so easily is that they support XML Schemas and data type definitions (DTDs). As more mature XML applications are developed, translating XML data (reading and writing) based on a DTD or Schema will allow much more flexibility in how the data can be used, because the DTD or Schema is easily mapped to relational tables or the needed data model.
Using XML on a small scale
Many smaller applications employ programming techniques to enable them to use XML as a data store for messaging or for storing smaller amounts of persistent data (5-10 MB). Applications like these can take large XML documents and import the data into a relational database. A DTD or XML Schema is available for validating the document.
For many applications, the programming is already done. They supply a COM object or Java package that processes the document and populates some type of programming object (a class or grid control). I recently implemented a component that locates product warranty information online using a supplied COM object that queries an XML file and populates another object. Not having to convert or import more data into a database is a nice benefit.
Document content and Web pages
Content delivered on Web sites is still basically stored in static HTML pages or relational databases, even though this type of “informational” content is probably best suited for XML. One of the more popular products that stores such content in XML is Cocoon from Apache. Enhydra is another Java/XML-based application server, and the eXist and Xindice database products easily integrate with Cocoon.
For catalogs, documents, and other data, XML delivers on the promise of an efficient data store and transport. More content is available with native XML databases or XML-enabled relational databases. Web sites and online content will benefit most from a native XML database. For more information, check out the XML:DB Web site.
XML doesn’t solve everything
Will XML become the standard? Probably not. With all the different types of applications, databases, and data types in the world, a universal XML-based data storage and retrieval application is impossible. Of the XML databases on the market, most are proprietary or commercial systems that can’t be accessed so easily across different clients and networks. XML:DB is an open standard to access XML databases, and many products do support this standard, including eXist and Xindice. Making XML work with existing databases will be an issue for some time.