Learn to read and write XML with .NET's XML classes

.NET's XmlTextReader and XmlTextWriter classes provide an easy way to work with XML. Lamont Adams explains how to use these classes and demonstrates with sample code.

The .NET Framework’s pull model for XML makes reading and writing XML files simple, thanks to the XmlReader and XmlWriter derived classes. Let’s take a look at how to actually use the XmlReader and XmlWriter classes to work with an XML document. (For a high-level overview of the pull model and .NET’s XML classes, check out “Take a guided tour of XML support in .NET.”)

I’ve written a console-mode application that should serve as a nice, almost practical example. The XMLDemoCSharp project (download the source here) allows a user to maintain a list of books as an XML document. The book catalog has quickly become the canonical XML example, and finding a sample XML book catalog should be easier than falling off a log. In a pinch though, you are free to use the sample document in Listing A. Just don’t say I never do anything for you, okay?

The bulk of the code for XMLDemoCSharp handles the user interface, which is ironic, considering that I chose to write it as a console app to keep things simple. You can effectively ignore most of that stuff, though, and concentrate instead on the Book class, which models the books in a catalog. Book exposes two static methods for retrieving and storing the catalog: LoadBooks and SaveBooks.

Reading XML with XmlTextReader
XMLDemoCSharp first loads the catalog by calling LoadBooks, shown in Listing B. LoadBooks uses the System.Xml.XmlTextReader concrete class to read the contents of the XML document element by element and creates a System.Collections.ArrayList representing the books in the catalog. XmlTextReader is a forward-only parser that works in a fashion similar to SAX, retrieving the elements in the order they appear in a document, from top to bottom. XmlTextReader doesn’t support validation, but it is fast and not very resource-consumptive. As such, it’s ideal for reading a document when speed is essential or when you aren’t concerned about validating against an XML schema or DTD.

Up the stream with a reader
Although XmlTextReader can handle the details of reading a document from a URL you provide to its constructor, it can also read an XML document from a System.IO.Stream object, as it does in LoadBooks. Streams are flexible input and output devices that allow you to access, not only files, but also memory and network data. If you aren’t comfortable with .NET’s stream-based input and output, I encourage you to familiarize yourself with it, perhaps by reading this article.

Reading a document with XmlTextReader can be a confusing proposition until you get your head around some basic ideas:
  • ·        XmlTextReader, like the DOM, presents an XML document as a series of nodes, and a node may or may not correspond to a single element in a document. In the case of elements with other elements nested inside them, like <catalog> in my example, each tag is represented as a node. So <catalog> is actually two nodes: one representing the start tag, <catalog>, and another representing the end tag, </catalog>.
  • ·        For data-containing elements, the element and the data will be three separate nodes.
  • ·        If an element has nested elements, it will actually be read twice: once when XmlTextReader hits the element’s start tag node, and again after all the nested elements have been processed, when the element’s end tag comes up in the stream.
  • ·        Methods beginning with the word Read will move XmlTextReader’s node pointer to the next node in the stream. Once you’ve moved past a node, you can’t move back to it without starting over from the beginning of the document.

Loading the book catalog
In LoadBooks, I first move past the root <catalog> element to the document’s first <book> element by calling the XmlTextReader.ReadStartElement method, which moves the reader to the next start tag it finds in the stream. I then set a while loop to run until the <catalog> element becomes current again, by checking the LocalName property on each loop iteration. When the reader hits the node representing the </catalog> closing tag, we’re through parsing the document.

The first step in processing a book is to retrieve its id attribute, which I do using the GetAttribute method. Next, I call Read to move to the first nested element of <book>. Each <book> subelement is then processed by another while loop that watches for the </book> end tag to become current, indicating that I’m through processing that particular book. Each data element is processed in the order it occurs in the document, and I examine the LocalName property of each element to determine what to do with the data I retrieve. Notice that I need to call a read method three times for each data element: Read for the start tag node, ReadString for the data node, and Read again for the closing tag node.

Debugging XmlTextReader applications
While working on XMLDemoCSharp, I of course had to track down and exterminate a few bugs. In the process, I discovered that the big challenge to debugging the app was that it was hard to tell where the reader is in a document: You can’t always tell by examining the LocalName property, since some node types will have no names. I found the LineNumber and LinePosition properties of XmlTextReader to be helpful in figuring out where the reader was in a document when it threw an exception. By checking these two properties and taking a quick look at the document you're parsing, you can usually figure out what’s going on.

Writing XML with XmlTextWriter
After looking at the catalog and perhaps adding a few new books to it, the user has the option of saving the catalog back to an XML file on disk. XMLDemoCSharp does this inside the SaveBooks method (Listing C), which makes use of the System.Xml.XmlTextWriter class. XmlTextWriter is a simple implementation of the abstract XmlWriter class that provides a fast way of creating XML documents and document fragments. XmlTextWriter doesn’t support validation, but it does expose methods for creating most XML constructs.

Like XmlTextReader, XmlTextWriter maintains an internal node pointer, and each node equates to a tag or piece of data in the document. To write XML to a document, you call one of several methods beginning with the word Write. Each of these methods writes a particular kind of XML element into the document’s stream: For example: WriteElementStart creates a start tag for an element, WriteElementEnd creates an end tag, and WriteElementString creates the start tag, data, and end tag nodes for a data element in one method call.

Saving the book catalog
Saving the book catalog is a much simpler process than loading it. First, I create a new XmlTextWriter instance using a Stream object passed into SaveBooks. Then, I use the WriteStartDocument method to write the XML version statement to the document. After that’s done, we can start writing out the document’s elements.

Because <catalog> doesn’t contain any data but does contain nested elements, I create its start tag using WriteStartElement. Inside the for…each loop, a <book> element is created for each book in the catalog. First the start tag is written using WriteStartElement. Then, six data elements representing the data for each book are created using WriteElementString. Finally, the <book> element is closed with an end tag created by calling WriteEndElement.

Once all the books have been written into the document, the root <catalog> element is closed with WriteEndElement. Notice that calls to WriteEndElement don’t specify an element name to close. That’s because XmlTextWriter keeps track of the last element you started internally and automatically writes the appropriate close tag for you when you call this method. The last thing we need to do is flush and close the writer’s underlying output stream, and the document is saved.

As you can see, using the .NET’s pull method classes to read and write XML documents is pretty easy compared to other major XML parsing APIs like SAX and the DOM. The XmlTextReader and XmlTextWriter classes are simple, fast, and straightforward to use, and the lack of validation isn’t a problem if you just need simple XML functionality.




Editor's Picks