In the past, there were only two ways to parse an XML file—SAX
(Simple API for XML) and DOM (Document
Object Model). The first reads an XML file in a sequential manner and signals
the application as it finds different XML components like elements and attributes,
while the second creates a tree representation of the data in the XML document,
and then offers various methods to navigate through this data.

While each of these two techniques comes with its own set of
pros
and cons
, Microsoft has introduced a third paradigm with the .NET Framework called the “pull model,” which attempts
to provide non-cached, forward-only, read-only access to XML data. In layman’s
terms, this means that you can access the information in an XML file
sequentially (like SAX) with the additional option of skipping over certain elements and their content at runtime
(like the DOM). A new object, the XmlReader()
object, has been created to perform this task.

Processing XML

To see how the XmlReader()
works, create the following temperatures.xml
XML file.

<?xml version="1.0" encoding="utf-8" ?>
<temperature unit="Celsius">
    <city name="New York">
        <month name="January">3.9</month>
        <month name="February">5.6</month>
        <month name="March">15.6</month>
        <month name="April">21.7</month>
        <month name="May">26.1</month>
        <month name="June">29.4</month>
        <month name="July">28.3</month>
        <month name="August">24.4</month>
        <month name="September">18.3</month>
        <month name="October">12.2</month>
        <month name="November">6.7</month>
        <month name="December">10.0</month>
    </city>
</temperature>

Nothing fancy here: just a list of temperature highs for
twelve months of the year in New York. The task is to calculate the yearly
average from these temperatures, which is performed by the script in Listing A.

The output of this script should look like Figure A.

Figure A

Output

If you look carefully at the script in Listing A, you’re
likely to be confused because the first thing it does is create an instance of
the XmlTextReader() object to read
the temperatures.xml XML file. Don’t
worry, there’s a simple reason. You see, the XmlReader() class is an abstract class that allows developers to
extend it in order to build their own parser, using the features of the XML
Pull paradigm. However, if you don’t have the technical skills to write your
own parser, or are just plain lazy (like me), you can use the XmlTextReader() class instead—it’s a
built-in derivation of the abstract XmlReader()
class.

Consistent with other members of the System.Xml assembly, the XmlTextReader()
class comes with a Read() method that
returns true when it encounters an XML node, and false when it hits the end of
the XML tree. When used in a while()
loop, this method ensures that the entire file is processed.

Within the while()
loop, I’ve used the NodeType property
to identify element nodes, and to
find out their name (this also holds true for attributes). I’m only interested in the unit and name attributes of the <temperature>
and <city> elements
respectively. The values of these attributes are retrieved with the GetAttribute() method. Temperature
values for each <month> are obtained with the ReadString() method. Once the individual month values are obtained,
the next step is to calculate the yearly average. This is accomplished by
adding all the values and dividing the sum by the number of months.

Validating XML

Every XML document should be well-formed and valid. While
any XML parser can validate an XML file, only select parsers can validate a
file against external DTDs, XDRs, or XSD schemas.
The System.Xml assembly provides one
such parser, in the form of the XMLValidatingReader()
object, which is another derivation of the abstract XmlReader() class. Consider the XSD schema shown in Listing B and its XML document instance
shown in Listing C.

Listing D shows
an example of validating a document instance against a schema. Load this
example in your browser, and you’ll see the XML file being successfully
validated by the parser, looking something like Figure B.

Figure B

Validation

To make things interesting, edit the above XML file and
delete the <netassets> element
for the third mutual fund. When you attempt to validate the file again, the XmlValidatingReader() will tell you that
the XML file is invalid (Figure C).

Figure C

Invalid XML

As before, I begin by initializing an XmlTextReader() object with the required XML file, and then passing
it to the constructor of the XmlValidatingReader()
object. The ValidationType property
sets the type of ruleset to perform the validation against; values include ValidationType.Schema, Auto, DTD, XDR.

Next, the ValidationEventHandler
property is linked to an event handler; this event handler will be triggered if
the parser encounters an error when validating the file. The Read() method takes care of processing
the file, and the blnValFlag variable
serves as a flag to determine whether the validation process was successful.

A solid foundation

Therefore, the abstract XmlReader()
class lays the foundation for the derived XmlTextReader()
and XmlValidatingReader() objects,
and makes it possible to do all kinds of nifty things. Will it succeed in pulling in the crowds to use this
solution over SAX and DOM? Only time will tell.