So far in our XML series, we’ve covered basic syntax and enforcing document structure through the use of DTDs and XML Schema. Now it’s time to put on your programmer’s hat and get acquainted with Document Object Model (DOM), which provides easy access to XML documents via a tree-like set of objects. Since there are DOM implementations in quite a few languages, I’ll try to keep things as language-neutral as possible in the process of introducing you to the specification. That means, unfortunately, no sample code.

Some things you should know about DOM
DOM is in reality nothing more than an abstract specification for accessing the content of a given document using a tree-like set of objects. The document doesn’t necessarily have to be an XML document; keep that in mind as you read along.

As with all things Web, the DOM specification is managed by the World Wide Web Consortium (W3C). Operating under a mandate to provide a uniform API for use with multiple platforms and languages, the W3C defines DOM as a set of abstract classes without an official implementation. So it’s up to individual vendors to actually provide implementations of the specification’s interfaces that are appropriate for a given platform and language.

DOM’s interface definitions were created using Object Management Group’sInterface Definition Language (IDL). It can often be helpful to examine these definitions even if you have no formal knowledge of IDL, which is fairly self-explanatory. I’ve linked to the appropriate IDL definition for each interface I mention in this article so that you can refer to it and the accompanying documentation if necessary.

DOM has three levels of functionality:

  • Level 1 provides only the most basic support for parsing an XML document.
  • Level 2 extends Level 1 by providing support for XML namespaces. This is the currently recommended level of functionality, and I’ll be referring you to Level 2 versions of the DOM interfaces in this article.
  • Level 3, which as of the day I’m writing this is still in the “working document” phase (meaning it’s subject to change), adds additional support for XPath queries and loading and saving documents.

Because the W3C’s specification is only a minimum recommendation, vendors can, and often do, provide proprietary extensions. This is why, for example, many of the available DOM implementations will already have XPath support built-in. You should be wary of using these extensions, particularly ones that represent Level 3 functionality. The interfaces of those objects are still very much subject to change, and the final, official versions may be incompatible with code you’ve written for the working versions.

DOM’s object model (Is that redundant?)
DOM expresses a document as a tree of Node objects. If you’ll recall, a tree is defined as a set of interconnected objects, or nodes, with one node providing the root. Nodes are given names corresponding to their relative position to another node in the tree. For example, a node’s parent node is the node one level up (closer to the root element) in the tree’s hierarchy, while a child node is one level down; a sibling is to the immediate right or left of a node on the same level of the tree. Figure A gives a more graphical explanation of these terms, which you can refer to if you find any of this family business confusing.

Figure A
A graphical illustration of node relationships

Node objects not only represent XML elements in a document, but they also represent everything else found in a document, from the topmost document element itself to individual content pieces like attributes, comments, and data. Each node has a specialized interface that corresponds to the XML content it represents, but these are all still nodes at heart. Object-oriented folks would say that all DOM objects inherit from node. The node interface is the primary method you’ll use to navigate a document’s tree and modify the structure of a document by adding new nodes.

The node knows
Node exposes a few navigation elements that allow you to move about a document’s tree. The parentNode method returns the parent of the current node, while the nextSibling and previousSibling methods return the right-hand and left-hand siblings of the current node. You can determine whether a given node has children by examining the hasChildNodes property.

Assuming that a node has children, these children can be retrieved using the ChildNodes property. ChildNodes returns all of the direct (one level down) children of the current node in a NodeList structure. NodeLists represent a group of nodes as an ordered list (retrievable by index number), while their cousins NamedNodeMaps represent them as a dictionary (retrievable by name). Both of these objects are “live,” meaning that changes made to a list are immediately reflected in the underlying tree.

Node objects also expose a set of methods for adding and deleting nodes to their group of children. The insertBefore method inserts a new node immediately before (to the left of) another node in a list of child nodes, while appendNode appends a node to the end (the extreme right) of the current node’s list of children. The replaceNode method directly replaces one child node with another, while removeNode effectively deletes a node from a group of child nodes.

Specialized node interfaces
As I’ve said, the node interface provides a convenient way to navigate a document and modify it, but to do much meaningful work, you’ll need to explore the less abstract DOM interfaces. In the remainder of this article, I’ll examine a few of these interfaces.

Document is the root
The Document interface extends the node interface to represent an entire XML document and provides the root element in a document’s tree (the <XML> element). Most implementations hand you a Document object when you load an XML document. Document is sort of a catch-all for things that affect the document as a whole or that don’t really fit anywhere else. Most of its methods serve as factory methods for creating other DOM objects. These “createX” methods provide a way to create Element, DocumentFragment, TextNode, CDATASection, ProcessingInstruction, attribute (Attr), EntityReference, and various namespace nodes for implementations in languages that don’t support traditional constructors.

Document also includes two useful methods for moving to particular locations in a document:

  • getElementsByTagName returns a NodeList of all elements with a given tag name in the order they were encountered in the document. This is a handy method for retrieving all instances of a particular element in a document, and since the elements are returned as nodes, navigation around the document is possible.
  • getElementByID returns the element with an attribute of type ID that matches the specified ID. It’s useful for quickly locating a single element in a document.

One final thing of interest about the Document interface is that the Node interface exposes an ownerDocument property that returns the node’s parent Document object.

The elements of the tree
Okay, I got ahead of myself there and mentioned two ways of retrieving an element before talking about the Element interface. Element represents, as you’d expect, an XML element.

The element interface deals quite a bit with attributes (which incidentally are also available from the root node interface), with 13 methods that provide some form of access to attributes. Of these, you’ll likely use the getAttribute/setAttribute and getAttributeNode/setAttributeNode methods most often. The former allow you to read or write an attribute’s value directly, assuming you can supply the attribute’s name. The latter allow you to work with the actual Attr object that represents an attribute.

Conspicuously missing from the Element interface is any method of retrieving the data associated with an element. This is because a given element’s data is considered to be a child node of that element, which is retrievable via the ChildNodes property of the root Node interface. If it contains only simple character data, then an element’s data node will simply implement the Text interface. However, in the case of complex data, a group of child nodes implementing appropriate Element, Attr, and/or Text interfaces, depending on the type of data, will be present as child nodes of the element.

Figure B� illustrates the complicated relationship between an element and its data node.

Figure B
Two elements with associated data nodes

Fragments of a document
When working with XML, a common task is to create a new set of elements and append them to an existing document. The DocumentFragment interface minimally extends node by changing the behavior of the insertion methods (insertBefore, appendNode, and replaceNode) so that when a DocumentFragment is inserted into a document, only its child nodes are inserted, not the DocumentFragment node itself. This makes DocumentFragment an ideal temporary attachment point for new nodes in an XML tree.

That’s about it for our guided tour of DOM. Stay tuned for the next part in my increasingly inaccurately named Remedial XML trilogy (Douglas Adams would be proud), when I’ll introduce the joys of the SAX API.