Learning your way around the DOM parser included in the Microsoft XML Core Services library (a.k.a. MSXML2) can be daunting. Let’s check out a sample Visual Basic 6 application that will make it a bit easier to understand how to parse, edit, and validate existing documents using MSXML’s DOM parser.

Throughout this article, I’ll be referring you to the sample BookEditor application we looked at last time. You can download the project here. Figure A shows the application. BookEditor uses MSXML2’s DOM parser to add, edit, and delete books from an XML-based book catalog, which seems to be the somewhat-overused standard introductory XML example. As a matter of fact, you may remember I used such a sample XML document in my Remedial XML series, which you are welcome to refer to if you find yourself in need of a little concept clarification.

Figure A
BookEditor in all its glory

This is part 2…where’s part 1?

This article winds up our two-part series on MSXML2’s DOM parser, the DOMDocument40 class. Part one showed you how to create a new XML document with the DOM and add a new book to the catalog.

Saving and loading documents in many forms
You may be surprised to learn that there is no facility for saving or loading a document to or from a disk file in the W3C’s DOM specification. This is because the DOM is meant to be platform independent, and different platforms usually have different semantics for utilizing persistent data. However, Microsoft’s DOM parser extends the DOM specification by providing this missing functionality as methods of the DOMDocument40 class.

There are, in actual fact, two methods for loading an XML document into a DOM tree. The first, Load, is meant to load a document from an unopened file, identified by either a full path or a URL. The second, LoadXML, creates a DOM tree from a string containing the actual XML for the document. BookEditor makes use of the former method in the cmdLoad_Click event handler (Listing A).

Saving a document is accomplished using the Save method, which is overloaded to work in various ways. If passed a file name or URL, Save will attempt to open the file and write the document to it. You can also pass Save an instance of DOMDocument40, which causes the document to be loaded and parsed using the new parser instance, as if you had loaded the document into the parser using Load or LoadXML. Finally, Save supports serialization by accepting one of any number of stream objects as an argument. In the interest of simplicity, BookEditor uses the first overloaded variant in its cmdSave_Click event handler (also in Listing A).

Moving around the DOM tree to edit or delete a node
After loading an XML document, it’s likely that you’ll want to modify it in some fashion, whether by editing the contents of a node or by deleting the node altogether. Either way, the first step is to find the correct node in the DOM tree, which you can do in several ways. With BookEditor, navigating to the correct book node is a common first step before it can edit or delete a book. BookEditor uses three methods of navigating the tree, just to give you an idea of the different avenues open to you.

Using getElementsByTagName() and IXMLDOMNodeList
The DOMDocument40.getElementsByTagName method is useful when you need to locate a particular element that appears multiple times in a document, and you know the tag name used to define it. I use this method in BookEditor’s LocateDomNodeFromTreeNode function (Listing B) to find the book element representing the currently selected book in the TreeView control, so that the book can be edited or removed from the catalog.

First, I call CatalogDoc.getElementsByTagName(“book”), which returns an IXMLDOMNodeList containing all elements in the current document with a tag name of “book.” Then, using a For… Each loop, I iterate through the node list collection until I find a book element with an id attribute matching the one I’m looking for and return that node to the function’s caller.

If the book is to be deleted, I simply remove the book element node from the tree using the root catalog node’s removeChild method. If, on the other hand, the user is editing the book, the returned node is handed off to Form2 via a public property, where the form’s Activate event handler again uses an IXMLDOMNodeList to access the book element’s child nodes—that is, the author, title, genre, price, pub_date, and description elements—to display the book’s data on the form.

Swinging through the tree with the children
You can, of course, also move from node to node in the DOM tree using the firstChild, lastChild, nextSibling, and previousSibling navigation properties exposed by the IXMLDOMNode base node interface. It’s simple to move around a tree like this, but it takes some planning to ensure that you wind up on the node you initially planned to wind up on, and you’ll need to include some extra documentation to make sure you remember where you were going when you edit the code later. The cmdOK_click event for Form2, shown in Listing C, uses these navigation properties to move around the book element’s child nodes while updating the book’s data using the nodeValue property of each value-holding child node.

An alternative way to read a text node

Remember that the DOM considers an element’s value to be a separate child node of the element node itself. So in my example, I use the firstChild.nodeValue property to get and set element values. The DOM also provides a special node interface, IXMLDOMText, to represent the value of an element and provide a standard interface for manipulating it.

Validating a document
I’m sure no one needs to be reminded that when you have users touching data, you’re bound to have data entry errors: The two go together like peas and carrots. BookEditor is no exception. Although I could certainly have written it to perform data validation itself, it’s much easier to use an XML schema to validate a user’s input. So BookEditor validates any edits against books.xsd, which appears in Listing C.

There are two ways to associate a schema with a document using the DOM. The first is to add the schema to the DOM parser’s schema cache by creating a new IXMLSchemaCache40 object, loading the document’s associated schema or schemas into it, and finally setting it into the parser’s schemaCache property. You can see how I do this with BookEditor in Listing D.

But there’s a drawback to this method. If you plan to share the same XML document across multiple applications, each app’s parser must also programmatically associate the schema with the document in this fashion.

To my mind, the preferred method would be to attach the appropriate namespace attributes to the document’s root element and reference the schema as an external document. Then, assuming that the DOM parser can resolve external references (DOMDocument40 can be made to do so via the resolveExternals property), the association is in the document itself and essentially bulletproof.

So why does BookEditor make the association programmatically using the schema cache? Because there appears to be a bug (although I haven’t been able to confirm it as such) in DOMDocument40 that will prevent a document from validating correctly if the schema association is made by adding attributes to the root element of the DOM tree in memory. In my case, validation always seemed to fail because of the noNamespaceSchemaLocation attribute. Oddly enough, if a document containing an external schema reference is loaded from a file, it will validate successfully.

I worked around this issue in BookEditor by programmatically associating the schema using the schema cache when creating a new document (as you saw in Listing D). When saving a document to disk, though, I add the external reference attributes to the root catalog element, as you can see in Listing A. The next time the document is parsed, the schema is already associated, so there’s no validation problem.

Suggest an XML topic

What sorts of XML development topics would you like to see covered on Builder.com? Send the editors an e-mail or post a message to the discussion below.