Pulling XML forward with the .NET Framework XmlReader object

Microsoft has introduced a third XML parsing paradigm called the pull model. This model attempts to provide non-cached, forward-only, read-only access to XML data.

In the past, there were only two ways to parse an XML file—SAX (Simple API for XML) and DOM (Document Object Model). The first reads an XML file in a sequential manner and signals the application as it finds different XML components like elements and attributes, while the second creates a tree representation of the data in the XML document, and then offers various methods to navigate through this data.

While each of these two techniques comes with its own set of pros and cons, Microsoft has introduced a third paradigm with the .NET Framework called the "pull model," which attempts to provide non-cached, forward-only, read-only access to XML data. In layman's terms, this means that you can access the information in an XML file sequentially (like SAX) with the additional option of skipping over certain elements and their content at runtime (like the DOM). A new object, the XmlReader() object, has been created to perform this task.

Processing XML

To see how the XmlReader() works, create the following temperatures.xml XML file.

<?xml version="1.0" encoding="utf-8" ?>
<temperature unit="Celsius">
    <city name="New York">
        <month name="January">3.9</month>
        <month name="February">5.6</month>
        <month name="March">15.6</month>
        <month name="April">21.7</month>
        <month name="May">26.1</month>
        <month name="June">29.4</month>
        <month name="July">28.3</month>
        <month name="August">24.4</month>
        <month name="September">18.3</month>
        <month name="October">12.2</month>
        <month name="November">6.7</month>
        <month name="December">10.0</month>

Nothing fancy here: just a list of temperature highs for twelve months of the year in New York. The task is to calculate the yearly average from these temperatures, which is performed by the script in Listing A.

The output of this script should look like Figure A.

Figure A


If you look carefully at the script in Listing A, you're likely to be confused because the first thing it does is create an instance of the XmlTextReader() object to read the temperatures.xml XML file. Don't worry, there's a simple reason. You see, the XmlReader() class is an abstract class that allows developers to extend it in order to build their own parser, using the features of the XML Pull paradigm. However, if you don't have the technical skills to write your own parser, or are just plain lazy (like me), you can use the XmlTextReader() class instead—it's a built-in derivation of the abstract XmlReader() class.

Consistent with other members of the System.Xml assembly, the XmlTextReader() class comes with a Read() method that returns true when it encounters an XML node, and false when it hits the end of the XML tree. When used in a while() loop, this method ensures that the entire file is processed.

Within the while() loop, I've used the NodeType property to identify element nodes, and to find out their name (this also holds true for attributes). I'm only interested in the unit and name attributes of the <temperature> and <city> elements respectively. The values of these attributes are retrieved with the GetAttribute() method. Temperature values for each <month> are obtained with the ReadString() method. Once the individual month values are obtained, the next step is to calculate the yearly average. This is accomplished by adding all the values and dividing the sum by the number of months.

Validating XML

Every XML document should be well-formed and valid. While any XML parser can validate an XML file, only select parsers can validate a file against external DTDs, XDRs, or XSD schemas. The System.Xml assembly provides one such parser, in the form of the XMLValidatingReader() object, which is another derivation of the abstract XmlReader() class. Consider the XSD schema shown in Listing B and its XML document instance shown in Listing C.

Listing D shows an example of validating a document instance against a schema. Load this example in your browser, and you'll see the XML file being successfully validated by the parser, looking something like Figure B.

Figure B


To make things interesting, edit the above XML file and delete the <netassets> element for the third mutual fund. When you attempt to validate the file again, the XmlValidatingReader() will tell you that the XML file is invalid (Figure C).

Figure C

Invalid XML

As before, I begin by initializing an XmlTextReader() object with the required XML file, and then passing it to the constructor of the XmlValidatingReader() object. The ValidationType property sets the type of ruleset to perform the validation against; values include ValidationType.Schema, Auto, DTD, XDR.

Next, the ValidationEventHandler property is linked to an event handler; this event handler will be triggered if the parser encounters an error when validating the file. The Read() method takes care of processing the file, and the blnValFlag variable serves as a flag to determine whether the validation process was successful.

A solid foundation

Therefore, the abstract XmlReader() class lays the foundation for the derived XmlTextReader() and XmlValidatingReader() objects, and makes it possible to do all kinds of nifty things. Will it succeed in pulling in the crowds to use this solution over SAX and DOM? Only time will tell.

Editor's Picks