Although most pundits agree that XML will become the standard for data storage and retrieval in the next decade, they can’t agree on how it will be implemented.
Many believe that current database systems (like Oracle, SQL Server, and Sybase) will migrate their engines to store native XML, and that manufacturers will build optimization, processing, and manipulation capabilities onto this new engine. Others believe that database manufacturers will leave engines intact and simply add an XML layer—allowing the engines to consume and emit the underlying data based on queries from existing XML languages like eXtensible Stylesheet Language Transformations (XSLT).
Both theories are based on the assumption that we need the query language of an underlying database in order to perform sophisticated queries against XML files. In this article, I’ll explain why this is a faulty assumption and how the XML working group’s draft specification for an XML Query Language (called XQuery) will not only serve as an XML query language, but will easily work against relational sources as well.
Getting a grasp of the language
With more data passed around as XML, and more systems designed to produce it, developers need a way to query XML sources for specific pieces of data from the data source.
The first standard approach to access these XML data sources was called XML Path Language (XPath). XPath was designed to allow navigation within an XML file and simple queries of a single file. Since XPath was designed to navigate and query a single XML data source, using XPath effectively to query multiple data sources requires the developer to perform complex XML document merges using XSLT or custom programs. The XPath approach is similar to how some companies create data warehouses today—data from multiple sources is pulled together and transformed into an identical format in a central warehouse repository. Managers can then use that repository’s tools to query the data.
XQuery was designed to solve this problem by allowing complex queries across not only multiple XML documents, but also between XML documents, relational databases, object repositories, and other unstructured documents. Going forward, XPath will focus on navigation capabilities (i.e., linking between documents or accessing a specific portion of a document), and the eXtensible Stylesheet Language (XSL) working group will be studying ways to incorporate XQuery-based queries in the XSL standard. This would create a powerful tool to search, aggregate, and present data from disparate sources using a unified query language (XQuery) and a powerful transformation and display formatting language (XSL).
XQuery is a very rich querying language. It understands data and data types, allowing complex value queries like “less than” or range queries. It has primitives that allow iteration through a data source, as well as sorting, aggregation, and grouping functions. It allows connectivity between sources and restructuring of documents based on defined criteria. More importantly, it includes a standard mechanism for extending the language with custom functions. Just as relational databases have their own query and stored procedure languages, XQuery provides similar functionality, except that it will work across both XML data sources and relational data sources.
Given its ability to query across multiple documents and data formats, XQuery would play a key role in the following scenarios:
- Business partners send catalogs in XML format, and you want to query the catalogs to compare prices of common products.
- When an intranet has data stored in relational databases and also extracts from other file systems containing data in XML files. The goal here would be building a portal service that categorizes and displays information from these disparate sources.
- You have product information in Word documents, product pricing in a relational database, and product incident history archived in XML files on the internal system. By using XQuery and XSL, you could create a customer extranet that presents a unified view of these three repositories without having to move any of the underlying data. Existing systems run unmodified, but you have a powerful new view of the data for your customers.
Is it too early to use XQuery?
It’s important to understand that unlike XML, XML Schema Definition (XSD), and XSLT, XQuery is a draft and not a recommendation. As such, the vendor community is just beginning to create tools to make it easy to generate XQuery programs. Currently, there are no existing graphical tools that create the underlying XQuery syntax automatically, so the only recourse is to create and debug the queries using simple text editing tools like Notepad.
This brings up the broader issue of when to adopt new technologies based on W3C standards (i.e., whether you should wait for the recommendation, or start using the technology at the draft stage).
When making this decision, you should use the same criteria as you would for beta code. For example, do the business benefits you would derive outweigh the cost of coding parts of the application over again when the recommendation becomes final? Some of this risk may be mitigated if you’re using a vendor’s tools, as the vendor may continue to support a draft version of the specification in order to release their tool before the formal recommendation. Microsoft’s use of XML Data Reduced (XDR) is a good example of this.
While the W3C was trying to finalize the XSD recommendation, there were many competing drafts, one of which was XDR. This was a subset of XSD, but it turned out to be incompatible with the final specification. Microsoft’s BizTalk relied on XDR, and its tools generated XDR schemas. Rather than waiting for the final XSD specification, Microsoft released an XDR-dependent version of BizTalk. But since customers use the BizTalk tools to create applications, Microsoft will be able to replace XDR with a fully compliant implementation of XSD in the future—including porting tools to upgrade existing applications. I expect that many applications will port easily, but some will require retooling in order to more fully support XSD.
Worthy of consideration
It’s obvious that XQuery is a very powerful language for analyzing both XML and non-XML data sources. Although it’s in its infancy, and not ready for widespread production use, it bears sufficient investigation and consideration as a potential tool in your company’s data manipulation arsenal.
If the standard and the vendor tools based on the standard can deliver on the promise of XQuery, then we may have truly found the Holy Grail of query languages. Even so, it’s likely that each vendor’s proprietary query language will be screaming “I’m not dead yet” for years to come.