A better way to create XML documents in .NET

Creating XML documents in .NET may not be hard, but it can be tedious. Here's what you need to know about .NET's streaming model for writing XML.

Creating XML documents is not necessarily difficult, but it can be tedious, especially if you must constantly build similar types of documents. It makes sense to rely on code to handle the repetitious tasks. But how hard, or how easy, is creating XML documents programmatically? It depends on your approach.

The tedious task of writing markup text
The simplistic answer to the question is that creating XML documents is as easy as creating a text file. After all, an XML document is just a text file. However, a more realistic answer is that writing markup text can be bothersome because you have to watch for missing quotes and tags and case sensitivity. In other words, you have to deal with the burden of writing markup text.

In addition, an XML document is hierarchical by nature, meaning that you write it incrementally—performing a stack-based operation. While writing, you open tags, edit attributes, and add children but need to keep track of the innermost element that's open. According to the XML syntax, you have to close the innermost element first to respect the nonoverlapping rule.

Writing the XML DOM way
The XML document object model (XML DOM) allows you to create XML documents by composition. You use a set of factory methods (CreateElement, CreateComment, CreateProcessingInstruction, and so forth) to create instances of node objects and then relate them to form a tree-based structure. These methods, though, create the document in memory. So how do you persist a new document?

As of Level 2 of the W3C XML DOM, there is no support for I/O in the official API. A pair of Load and Save methods will officially become part of the standard XML DOM with Level 3 of the recommendation, currently at the final stage of the path. Not that the lack of support represents a problem for a real-world application, but bear in mind that right now, any Save method you use on an XML DOM-like structure is a proprietary extension to the W3C DOM. The Microsoft's XML Core Services Library (MSXML) has supported document persistence since the first version.

The key advantage of the XML DOM is that it provides a layer of abstraction, saving you, poor programmer, from the burden of dealing with the restrictive rules of well-formed XML. The idea is that you define the structure, and the framework takes care of the details of content-to-XML translation. The downside of the XML DOM approach is the memory footprint, which increases as the size of the document gets bigger. Before being persisted to a storage medium, the document is kept entirely in memory. As you can guess, this is not an optimal approach in terms of performance if you're dealing with large documents.

The .NET streaming model
The Microsoft .NET Framework provides a more productive, effective, and even elegant approach to writing XML code programmatically. Based on XML writer components, the approach represents the writing counterpart of the stream-based parsing model I discussed in a previous article.

An XML writer represents a component that provides a fast and forward-only way of outputting XML data to streams or files. More important, an XML writer guarantees—by design—that all the XML data it produces conforms to the W3C XML 1.0 and Namespace recommendations.

XML writers differ from XML DOM objects because they cache much less information. An XML writer is not the in-memory representation of the document being edited or created. An XML writer is a simple writing instrument that accumulates internally only the XML text produced by the various elements created. Unlike the XML DOM, the writer's internal buffer can be flushed at any time to the physical stream—a local disk file, a remote URL, a stream object.

To some extent, you can see the XML writer component as an abstract API built on top of a data stream. Instead of having methods to write strings or an array of bytes, an XML writer provides methods to write XML elements and attributes. Let's see a practical example.

An XML directory listing
Suppose you have to write a class that persists a directory listing to XML. The code in Listing A shows how to proceed. The code creates a new XmlTextWriter and begins to add elements. The directory information is retrieved using the DirectoryInfo class and its GetDirectories method.

The XmlTextWriter class is the tool you use to create an XML document in a disk file. The null argument passed to the class constructor indicates the default encoding schema (UTF-8) for the document. The Formatting property sets automatic indention for the lines of the documents.

WriteStartDocument and WriteEndDocument bracket the document's writing. The former method inserts the standard XML prologue with the XML version and encoding information. The latter method closes all pending elements and resets the internal state of the writer object. In between these two calls, you use other WriteXXX methods to create specific XML elements such as nodes, attributes, and comments.

An element node is wrapped by successive calls to WriteStartElement and WriteEndElement. WriteStartElement corresponds to the open tag; WriteEndElement corresponds to the closing tag. Attributes for the currently opened element (the top of the stack) are set using the WriteAttributeString method. Finally, WriteString inserts plain text in the body of an element node.

By default, the document is flushed to the underlying stream only when the WriteEndDocument method is called. However, if you're going to write a large document, the Flush method lets you optimize the memory occupation. Flush can be called at any time during the document's creation and empties the internal buffer, updating the underlying stream. The underlying stream is locked until the XML writer finishes.

The XML writer is a great helper tool, but it's not perfect. It does not validate the contents against a schema or a DTD document, nor does it fix erroneous information you may pass. For example, if you added the same attribute twice, no error would be signaled.

Beyond markup
Readers and writers are at the foundation of every XML I/O operation in the .NET Framework. By using XML readers, you parse documents in a more cost-effective way. By using XML writers, you go far beyond markup to reach a node-oriented dimension in which, instead of just accumulating bytes in a block of contiguous memory, you assemble nodes and entities to create the desired schema and infoset.

.NET and XML
What additional .NET and XML topics do you want to see? Contact the editors with your suggestions or post a comment below.


Editor's Picks