The Document Object Model (DOM) is the oldest, and arguably the easiest, method of working with generalized XML documents. Microsoft’s XML Core Services 4.0 (MSXML2) provides a competent DOM parser that meets or extends all the W3C’s recommendations for a DOM Level 3 parser. The trouble is that the documentation included with the library, frankly, sucks. Weeding through the myriad COM classes is not for the faint of heart, and information on the most trivial of tasks is glossed over.
Play XML catch-up
Check out Builder.com’s “Remedial XML” series for more information on XML syntax, DTDs, schemas, and parsing with the DOM and SAX. We have a convenient downloadable version of the entire series here.
There are three basic tasks you’ll perform with a DOM parser: reading an existing document, editing an existing document, and creating a new document. You’ll probably be concerned with validating a document in one way or another as well. I’ve created a sample VB6 application called BookEditor, which demonstrates the use of MSXML’s DOM parser for creating and populating a new XML document that will look similar to the book catalog XML file used throughout my “Remedial XML” series. You can download the code for this sample app here.
Which DOMDocument do I use?
The DOM parser itself is implemented in a DOMDocument object, which also represents the ultimate root of the DOM tree. Unfortunately, no fewer than four DOMDocument objects are to be found in the MSXML2 library—a confusing state of affairs. Microsoft’s reason for creating so many document objects is beyond the scope of this article (it has to do with backward compatibility and COM ProgIDs). So let’s just say that DOMDocument40 represents the latest version of DOMDocument and is the one we’ll use in this article.
All the elements found in a document are arranged into child nodes of the document object, which also exposes a number of factory methods used to create the different kinds of nodes you’ll be working with as you move around a document. IXMLDOMNode is the base class for all the varieties of node you’ll be using as you work with an XML document. The node types you need to be concerned with appear in Figure A.
Figure A
|
A few key node types
Moving around in a document
The base IXMLDOMNode class exposes methods you can use to move about in a document’s tree. The firstChild and lastChild properties return an instance of the first and last nodes found under the current node. From there, you can move to that child node’s next or previous sibling node using the nextSibling and previousSibling properties. Check the nodeType property to determine what kind of node the current node is. If it’s an element, you can access the value it contains using the nodeValue property.
You can also access a node’s children through an IXMLDOMNodeList collection, which you retrieve using the childNodes property of a node instance. IXMLDOMNodeList is an indexed-list collection; that is, you can retrieve any of a node’s child nodes through its numeric, zero-based index.
Creating a new XML document
The sample BookEditor application allows a user to create a new XML document based on the values entered for a hypothetical book. The TreeView control on Form1 displays the contents of the currently loaded book catalog, as you can see in Figure B.
Figure B |
![]() |
BookEditor displaying three books |
The first thing BookEditor does upon startup is to initialize a new, blank XML document to use as a root for adding new book elements to the document tree. This is done inside the SetupCatalogDoc Sub, shown in Listing A.
Before creating any nodes, I configure the DOM parser by setting a few properties of the module-level variable CatalogDocDOMDocument40 object:
- · The async property controls whether the parser operates in synchronous mode. Asynchronous mode (the default) allows your application to perform other tasks while the DOM parser builds the node tree for a large document. Since my demo app works with relatively small documents, I set the parser to synchronous mode (false) to cut out the complexity involved in monitoring the progress of an asynchronous parse.
- · The preserveWhiteSpace property controls whether the parser inserts extra elements into the DOM tree to represent white space in an XML document. These extra elements can complicate node navigation, so I turned this feature off by setting the property to false.
- · The resolveExternals property controls whether the parser attempts to load external referenced documents or namespaces found in the XML document. You’ll find this feature handy when dealing with a document that has an associated XML schema or DTD: Setting resolveExternals to true causes the parser to automatically load a schema or DTD to validate the document against at a later time.
With the parser set the way I want it, I can go about creating the nodes I need to form the root of the new document. First, I create a new IXMLProcessingInstruction node to represent the XML version statement that should appear at the beginning of any XML document. Then, I create an IXMLElementNode to represent the root catalog element of the document. After adding both these nodes to CatalogDoc, I have the following simple XML document (check the CatalogDoc.xml property to verify this):
<?xml version=”1.0″?>
<catalog/>
Creating a new book
Now that CatalogDoc is all set up with a root element, the user can add a new book by clicking the New Book button. Check out the cmdNew_Click and BuildNewBookNodes routines in Listing B to see this in action.
I first create a new book element using CatalogDoc.createElement, and append it to the root catalog element (accessed through the documentElement property of CatalogDoc). Next, I create a new IXMLDOMAttribute node to hold the id attribute for the book the user is about to add. In this case, I just count the current number of books in the document (which includes the new book element we just added), and append that number to the text “bk.” A call to BuildNewBookNodes adds the empty title, author, price, publish_date, genre, and description elements so that the XML document now looks like this:
<?xml version=”1.0″?>
<catalog>
<book id=”bk1″>
<author></author>
<title></title>
<genre></genre>
<price></price>
<publish_date></publish_date>
<description></description>
</book>
</catalog>
The newly created book element is handed off to Form2 via a public variable, and the editing form is shown (Figure C). When the user finishes entering information about the new book and clicks OK, I insert the data from the form into the appropriate book subelements by traversing its tree of child nodes. Check out Listing C for the cmdOK_Click event handler where this takes place.
Figure C |
![]() |
Adding a new book to the catalog |
We’re finally left with an XML document that looks like this (once again, check CatalogDoc.xml to verify):
<?xml version=”1.0″?>
<catalog>
<book id=”bk1″>
<author>Lamont Adams</author>
<title>Lamont’s First Book</title>
<genre>Fantasy</genre>
<price>10.00</price>
<publish_date>2005-11-05</publish_date>
<description>It’s fantasy until I actually write it</description>
</book>
</catalog>
To be continued…
We’ve successfully created a new DOM document and added some data to it with the BookEditor application. I’ve only scratched the surface, though, as there’s still a lot more to be done: BookEditor needs to allow users to save and load an XML file and validate an in-memory document against a schema. I’ll show you how to do all this, and a little bit more, in a future article.