In the world of XML, everything revolves around the Document Object Model (DOM). The DOM is the set of nodes, elements, attributes, and other items that represent the current XML file or data. Using DOM, a program can process the data from XML more efficiently than a custom parser application can. Here’s a look at the basics of processing XML documents, nodes, elements, and attributes with Java via DOM.
Loading and saving documents
To begin working, an XML stream or file needs to be loaded into a Document class. Once loaded, the information is manipulated and then saved to a file or sent out to another stream. This type of processing is generally called DOM processing. Other ways to work with XML documents include directly manipulating the file and using XSLT.
Java API for XML Processing (JAXP) is used to create the document object. In Java, most XML classes follow JAXP standards so that different XML processors can be plugged into the application. The examples and listings in this article use the Xalan 2.3.x classes available from Apache. The latest version, 2.4D1, should work as well. The xalan.jar, xercesImpl.jar, and xml-apis.jar files need to be in the classpath. Listing A shows most of the classes needed to load, process, and save an XML document.
The first step is to create a document with a Document Builder Factory that has been predefined by JAXP (see Listing B).
Once the document is loaded, any node or element is accessible. Information such as the DTD location and DocType are viewable. The process of saving an XML document involves creating an output stream and a serializer that’s responsible for transforming the objects into ASCII that can be saved as a text file (Listing C).
Working with nodes and elements
Developers often confuse XML nodes and XML elements. Within an XML document, “node” and “element” can—and often are—used almost interchangeably. But when you’re programming DOM, they are very different. The main thing to remember is that everything is a node in the DOM. Elements expand the Node object, adding text and attributes.
For example, the tag <this/> is a node. If an attribute or text is added (<this id=””>text</this>), an element is created. The latter example, having an attribute and text, is also called a complex node and should be avoided. In Java (and more precisely, Xalan), elements are primarily used for working with attributes. Retrieving the text value for a node is fairly straightforward:
Node productNode = XPathAPI.selectSingleNode(doc,xPath);
This returns the first node matching the XPath Query. If the node has a text value (<node>my value</node), it can be accessed by retrieving the value of the first child node like this:
Remember, everything in an XML document is treated as a node. Listing D demonstrates the creation of a “text” node.
The XPathAPI class is used to retrieve information from the XML document as well as to create new nodes. XPath is a separate W3C specification for addressing elements of an XML document. Xalan includes a package for this that is included in Listing A.
To retrieve the value of an attribute, we must first convert the node to an element by using an explicit cast:
Element el = (Element) productNode;
I recommend having both a node and an element of the same data when processing elements. When you update the document, a node, not an element, needs to be appended to or inserted into the document. Elements provide programmers with more methods for working with attributes. The two most common are:
String sValue = el.getAttribute(“id”)
You have a couple of choices when adding a node to the document. You can append a node to the end of the current node’s child nodes or insert it into a certain location in the list of children. If you need to keep all the same type of nodes in a particular order, inserting the new node into the correct place is probably necessary. Otherwise, you can just append the node as shown in Listing D.
To insert the node into a particular location, you need to have three nodes available: the new node, its parent node, and the node you want to insert before. If this last node is null, it will work just like the appendChild method. In Listing E, the new color will be inserted before any size nodes.
You can use Listing F if the product node isn’t necessary.
Retrieving a set of nodes matching an XPath criterion is often necessary as well. The NodeList class handles a series of nodes as a collection. Listing G shows a basic example of looping through a list of nodes. In this example, the “colors” node contains a list of “color” nodes and a color (“red”) needs to be removed from the list. You can include more logic to handle adding nodes or to modify other nodes in the list.
In this example, the node could have been removed without looping through a node list. This example could include some other processing on each node, such as capitalizing the first letter or changing “purple” to “magenta.”
Why the DOM?
The DOM is a great vehicle for working with and processing data. If you haven’t started working with XML yet, chances are that the day is coming when you will. Getting a good foundation in DOM programming is well worth the effort. Many commercial applications also process XML files, so understanding how these programs process data will help build confidence when programming around them.