When you need to process XML documents, you must first
decide whether to use DOM (Document Object Model) or SAX (Simple API for XML), the
two main XML APIs in use today. You can use either (or both at the same time)
to process XML documents, but DOM loads the document into memory to process it.
SAX, on the other hand, can examine an incoming XML stream so that not all of
the XML code need reside in memory simultaneously.

You choose between DOM and SAX in much the same way that you
might choose between tables or views in a database: Select the approach that
suits the situation. If you want to simply explore an XML document and not
manipulate it, then choose SAX.

The differences between SAX and DOM

There are a number of key distinctions between SAX and DOM,
including:


DOM is preferred for complicated jobs, such as when the XML schema is
inherently intricate or when you need random access to the data in the
document. SAX moves in a linear fashion from the start of the document down
through each node to locate a particular node or otherwise provide information
about the document.


DOM builds a type description for every node in the XML document it loads into
memory. Collectively, these descriptions result in an easily traversable,
though potentially huge, tree structure. If the XML is verbose, DOM represents
runaway inflation. For example, a 300-KB XML document can result in a 3,000,000-KB
DOM tree structure in RAM or virtual memory. By contrast, a SAX document is not
deconstructed at all, nor is it cached in memory (though, of course, parts of
it reside briefly in memory buffers as the XML stream is read through). SAX is
a “lighter” technology—imposing little burden on your system. SAX is the equivalent
of watching a marathon go by; DOM is like inviting all the racers home for
dinner.

So which do you choose? If you’re doing something
complicated such as advanced XSLT transformations or XPath filtering, choose
DOM. You’d also pick DOM if you’re actually creating or modifying the XML
documents.

On the other hand, choose SAX for searching or reading XML
documents. SAX can quickly scan a large XML document, then stop when it finds a
match to your search criterion and hand you the appropriate fragment from the
document.

In some situations, the best choice is to employ both DOM
and SAX for different aspects of a single solution. For example, you might want
to load XML into memory and modify it with DOM, but then transmit the final
result by emitting a SAX stream from the DOM tree.

Using the XmlReader class

If you’re interested in employing SAX, it’s free and you can
find considerable help at the SAX
Project page
. You can also use SAX within Microsoft’s Visual Studio. Visual Studio also offers
a more flexible alternative to the traditional SAX API. The XmlReader class provides
all the efficiencies and advantages of SAX, but it adds the ability to easily
customize the behaviors available in the class. Though both SAX and XmlReader
are forward-only, read-only systems, with XmlReader you can skip forward if you
want to. For example, you can employ the reader’s MoveToContent and Skip
methods to avoid having to slog serially through every node in the
document—notifying your code of the nodes as you go.

Another primary advantage of the XmlReader class is that it
pulls each XML node into your source code (rather than pushing it as SAX does).
This allows you to more effectively manage some kinds of data. For instance,
with XmlReader, it’s relatively straightforward to examine multiple input
streams simultaneously.

To get an idea of how to use XmlReader, start a new
Windows-style Visual Basic .NET project in Visual Studio and add the following
namespace references at the top of the code window:

Imports System.Xml
Imports System.IO

Now cut and paste the code from
Listing A into the Form_Load event. Note that the code
actually instantiates an XmlTextReader object, which is derived from the
XmlReader abstract class.

Before trying to execute the code, you must substitute the
path of an actual .XML file on your hard drive (any .XML will do) for the “c:\books.xml”
string in this line of the code:

Dim Xr = New XmlTextReader(“c:\books.xml”)

Once you’ve done that, you can press [F5] to execute your
program, and you’ll see that the XmlReader in this code has parsed the document
and can report its number of elements and attributes.

When you instantiate an XmlTextReader, you can
simultaneously provide its constructor with the target XML filepath in a
string, as I did in this code. However, this is a heavily overloaded
constructor, so you can provide a variety of arguments when instantiating an
XmlTextReader: path, stream, another XmlTextReader, XmlNameTable, XmlNodeType,
XmlParserContext, and various combinations of these objects.

In Listing B, both
the schema and data are extracted from a string argument. Assume that the
caller in this situation has read a node from a stream and presents it to this
XmlTextReader for analysis.