Developer

Take a guided tour of XML support in .NET

Learn your way around .NET's XML neighborhood. Here's a look at the Framework's various reader and writer classes.


Much has been said about .NET’s use of XML, and unfortunately, a lot if it is hyperbole. Still, two things are undeniable: .NET puts an integrated set of XML tools into the programmer’s hands, and this is really the first time a Microsoft development platform has had integrated XML support out of the box. This is complete support, with well over 150 classes to be found in the System.XML namespace, a dizzying number. What follows here is my attempt to give the uninitiated a guided tour of some of the important XML classes in .NET and a look at the philosophy behind its API.

The “pull model” vs. the DOM and SAX
If you’ve used XML before, you’re probably familiar with the two main flavors of parser that are available, the Document Object Model (DOM) and the Simple API for XML (SAX). These two parser types operate in fundamentally different ways: DOM loads an entire XML document into a hierarchical tree in memory, while SAX runs through a document one element at a time, handing each element back to an application through some kind of communication interface.

The DOM? SAX? What’s that?
If you think “Deluise” when someone says DOM, or “Coltrane” when someone says SAX, you might want to check out Builder.com’s "Remedial XML" series to get your bearings before you continue here. In addition, if you’re VB6-literate, see “Creating XML documents with the DOM in VB6” for more on DOM parsing.

Both these APIs have their respective problems, as well. The DOM is terribly consumptive of resources, especially with large documents. SAX is not very intuitive to use, requires the programmer to keep track of previously processed elements, and doesn’t really provide a way to work only with selected parts of a document.

The .NET XML classes strive to reach a happy medium between these two APIs and incorporate the best features of both into something Microsoft calls the “pull model.” Not much of a name, I know. I’d have called it XML streams or XML stack, but I suppose those wouldn’t be sexy enough. Anyway, with .NET, you use the pull model classes to parse and create XML documents using a simple stream-like interface. Two abstract classes, System.XML.XMLWriter and System.XML.XMLReader provide the basis for .NET’s pull model XML support.

Writing XML with XMLWriter
The XMLWriter class is essentially an XML-aware wrapper for an output stream that makes creating an XML document a breeze. The class includes methods for writing all types of XML content, as you can see in Figure A.
Figure A

WriteAttributeString

Writes an attribute with the specified string value

WriteCData

Writes out a CDATA section containing the specified text

WriteComment

Writes a comment: <!—…—>

WriteElementString

Writes an element containing a string value

WriteProcessingInstruction

Writes a processing instruction: <? … ?>

WriteRaw

Writes raw XML to the stream

WriteStartDocument

Writes the XML declaration that should appear at the start of every document

WriteString

Writes a string value

WriteWhitespace

Writes whitespace

XMLWriter’s XML content creation methods

The XMLTextWriter class is a concrete child class of XMLWriter that you can use in your applications as is. Simply pass an instance of an output stream object to the constructor and begin writing XML. You could also extend the base XMLWriter class yourself and create a custom writer if you so desire.

Reading XML with XMLReader
If XMLWriter is a wrapper for an output stream, XMLReader is best viewed as an XML-aware wrapper for an input stream. The class’s Read method allows you to quickly traverse a document, but it's forward-only—no going backward. You can retrieve the contents of the current node in the document through the Value property. By default, XMLReader performs a depth-first traversal of the XML document, meaning it reads child elements before sibling elements in a document. If that sounds confusing, it might help to think that this is the same way you read an XML document yourself. See Figure B if you’re still fuzzy.

Figure B
A depth-first traversal


The base XMLReader class does not support validation, although it will report well-formedness errors in a document by raising an XMLException. XMLReader has a few child classes that extend it with custom capabilities, and as always, you’re free to extend XMLReader yourself and develop a custom parser for your application.

XMLTextReader for bare-bones parsing
XMLTextReader is a concrete subclass of XMLReader that provides the most basic XML parsing support. It doesn’t validate, but it's the fastest of the .NET XML readers and is very configurable. You can instantiate an XMLTextReader from several different sources using any of 14 constructors, including a file, URL, or an input stream.

XMLValidatingReader supports validation
XMLValidatingReader goes a bit beyond XMLTextReader by providing XSD, DTD, and XDR validation and externals resolution. Schemas are cached in an XMLSchemaCollection and can be added programmatically via XMLTextReader’s Schemas property or by using an XMLUrlResolver class (see below) to resolve an external reference to a schema or DTD embedded in the document. The simplest way to create an XMLValidatingReader is to base it upon an XMLTextReader using the appropriate constructor overload.

XMLNodeReader for node-based parsing
For those who are simply hooked on the DOM, the XMLNodeReader class layers the XMLReader API over a DOM-like parser, so you can easily parse DOM document trees using the forward-only pull model. To use XMLNodeReader, create an XMLDocument object representing the document you want to parse. You can then access that document as a series of XMLNode objects.

The supporting cast
In addition to the “big three” XML readers, you should be familiar with a variety of utility classes in the System.XML namespace:
  • XMLResolver is the abstract base class for XMLUrlResolver, which I mentioned earlier. It provides external entity, schema, and DTD resolution and handles import resolution for XSL and schema documents. XMLUrIResolver, by default, provides resolution services for all classes found in the System.XML namespace. You can, of course, extend XMLResolver to provide custom resolution for your application. Most classes include an XMLResolver method for specifying a custom resolver.
  • XMLConvert is a handy utility class containing a set of static methods you can use to convert XML element data into native .NET types and to handle character encoding.
  • XMLNameTable provides a shortcut method of comparing nodes. It stores the names of all elements and attributes found in a document as object references instead of strings. You can then use it to compare two node names as objects, which is a less expensive operation than comparing two strings for equality.
  • XMLNode represents a single node in an XML document, providing the standard DOM-like navigation and information members. XMLDocument extends XMLNode to provide the top-level node for a DOM traversal of an XML document.

Editor's Picks

Free Newsletters, In your Inbox