Navigating XML documents with XPath

Sure, XML is great for passing data around. But when the data gets where it is going, how are you supposed to extract it to use in your legacy order entry system? Learn how XPath can make simple work of the task.

By Brian Schaffner

There are many approaches to navigating XML documents. Some solutions involve parsing a document into a Document Object Model (DOM); others use the Simple API for XML (SAX). Yet other techniques rely on XML Stylesheet Language Translations (XSLT). Each approach solves the problem of parsing the XML document in a different way. Here we'll look at the XML Path Language (XPath) and how you can use it within XSLT to navigate an XML document.

XPath basics
XPath is a language for addressing pieces of an XML document. An XML document is made of items such as elements and attributes. Using XPath, you can easily select one or more pieces of a document. The selected elements are identified using an XPath expression. The expressions include:
  • ·        A path to a specific element.
  • ·        A wildcard that selects multiple elements.
  • ·        A function that selects zero or more elements.

For example, the path /CustomerOrder/LineItems/Item[0]/Price selects the price for the first line item in a customer order. This path obviously mimics the structure of a file system. The reason is that most file systems are sets of structured data (as are XML documents). A path is a mechanism that describes how to navigate from one point in the structure (or tree) to another. Each piece of data is a point in the tree or a point along a particular path.

By example
Consider a simple XML document, like the one shown in Listing A. This document illustrates some of the essential information used in creating a customer order that might drive an invoicing system.

We'll use this CustomerOrder data to drive our fulfillment operation. The shipping center is using an antiquated system that can receive data only in comma-delimited file format. Using XPath combined with an XSLT template, we will create a file that meets the specifications of the shipping center.

The format for the shipping system is:
OrderNumber, Name, Address, City, State, Zip, SKU, Quantity

Unfortunately, the shipping system is not designed to ship multiple items in the same box. Each line item from the order will be shipped separately. So we'll need to first think about how we are going to loop through each Item in the XML code to create an output line for the shipping system.

Using the for-each XSLT tag, we can easily grab all of the Items from the XML document. The for-each tag uses XPath in the select attribute to identify the Items the loop applies to. In this case, we want all of the Items. The for-each tag to select all of the Items is as follows:
<xsl:for-each select="/CustomerOrder/OrderInformation/LineItems/Item">

Within this loop, the translation will select each Item element from the /CustomerOrder/OrderInformation/LineItems element. Notice that the format of the XPath expression is similar to the format used when addressing directories in a file system.

Now that we can access each Item, we need to pull the rest of the data from the XML file. The shipping address is a major piece of the output file. We could assume that the shipping address is the first address every time the XML file is sent; however, this may not always be the case. And besides, XPath provides an expression that will ensure that the correct address is used.

To get the CustomerName, Address, City, State, and Zip for the shipping address, we need to create an XPath query. A query is an advanced type of XPath expression that will select certain data based on criteria provided in the expression. To query our XML document to look for the shipping address, we need to find the Address element whose type is "shipping." The format for our query is:

We can take this query and place it directly into a value statement to access the customer's shipping City:
<xsl:value-of select="/CustomerOrder/CustomerInformation/Address[@type='shipping']/City">

Now we're ready to put all of this together into our final XSLT template, shown in Listing B.

Notice that the SKU and Quantity fields do not use a full-qualified element name. The reason is that these elements are taken from the current context, which has been set by the for-each loop. The output from this transformation is shown in Listing C.

Follow the XPath
The XPath language is a great tool for accessing data in XML documents. It provides a robust interface for querying, looping, and expressing values within XML. In this article, we've illustrated some basic concepts of XPath and shown you a simple example of how to use XPath expressions to access XML data.


Editor's Picks