Over the years, XML has become an important technology,
because of its unrivalled ability to mark up content and make it more useful.
Almost all modern development platforms now provide some kind of native support
to read and parse XML documents. This includes ASP.NET,
which comes with a handy XmlDocument()
class to parse an XML document tree using the DOM.

Over the course of this article, I will give you a crash
course in the XmlDocument() object,
showing you how to create a tree representation of an XML document in memory
and traverse it, and then use this knowledge to parse Builder.com’s own RSS feed and save the
items within it to a Microsoft SQL Server
database.

Faster, higher, stronger

The XML file I’ll be using throughout this tutorial is a
list showing the number of Olympic medals—gold, silver and bronze—won by the
athletes of the competing countries:

<?xml version="1.0"?>
<countries>
    <country name="USA" rank="1">
        <gold>35</gold>
        <silver>39</silver>
        <bronze>29</bronze>
    </country>
    <country name="China" rank="2">
        <gold>32</gold>
        <silver>17</silver>
        <bronze>14</bronze>
    </country>
    <country name="Russia" rank="3">
        <gold>27</gold>
        <silver>27</silver>
        <bronze>38</bronze>
    </country>
    <!— more countries here.. —>
</countries>

Now, let’s look at Listing A to see how to parse this file using the XmlDocument() object.

While this isn’t exactly riveting reading, the output does
give you an idea about the tree-like structure of the DOM (see Figure A).

Figure A

Most of this tree generation happens through the XmlDocument() object, whose Load() method reads the contents of the
XML file into memory. Once the file is loaded, the XmlDocument() object builds a DOM tree that begins with the
“root element” (the main <countries> elements) and progresses
hierarchically to its children.

To begin traversing the tree, it is necessary to first
obtain a reference to the root element, through the “DocumentElement”
property of the XmlDocument() object.
This property returns the required reference in the form of an XmlElement() object, that in turn
exposes properties containing information about the name and data stored by the
node, and methods to access the next level of the tree.

Since each node exposes methods to get to the next level of
the tree, a common way to traverse a DOM tree is write a function that runs
recursively until all the nodes are processed. In the example above, this
function is called ReadXmlFile(), and
it accepts an XmlNode() object as input
parameter. The “NodeType” property of this object is then used to
determine the type of node—element, attribute, comment or text—and its name and
contents are displayed. For attributes, the “Name” and
“InnerText” properties of this XmlAttribute()
object are used to obtain the name-value pair for each attribute; for character
data, the “Value” property is used to obtain the text content of the
node.

The “ChildNodes” property is then used to test if
the node has children. If it does, an XmlNodeList()
object is returned, and a for() loop
is used to pass each child back to the custom ReadXmlFile() function for further processing. The integer variable
“intLevel” monitors the current depth in the XML file as the
recursion progresses. Recursion stops once the final element is reached and no
further children exist.

Builder.com on your Web site

The steps in the previous example make up a fairly standard
process to traverse an XML document tree, from its root element to its closing
element. Next, let’s look at a more practical example. Consider Really Simple
Syndication (RSS), which allows a Web master to propagate the content of
his/her Web site using an XML “feed.” With the ASP.NET XmlDocument() object, it becomes easy to
write an RSS client to access such a feed and use the information inside it on
your own Web site. This is precisely what the next example will demonstrate—how
to use the XmlDocument() object to
extract information from a remote RSS feed into a local database. Once it is in
the database, it can be accessed for display or search purposes.

Now, you might not have known this, but Builder.com has its
own RSS feed as well.
Listing B shows a sample of what it looks like.

What I need now is a script that reads this RSS feed, parses
it to extract the headlines, and converts the data inside it to INSERT
statements suitable for use with MS-SQL. The code is in Listing C.

In this script, since I’m primarily interested in the
headlines, my Page_Load() function
doesn’t waste time traversing each and every element. Instead, it uses the GetElementsByTagName() method of the XmlDocument() object to filter the tree
down to only those elements named <item>, and their children. The return
value is an XmlNodeList(), which is
passed to the custom storeRSSFeed()
function for further processing.

The storeRSSFeed()
function first creates instances of the objects required for database access,
and then iterates over the supplied XmlNodeList()
collection to access each child. The “InnerText” property is used to
populate the appropriate parameter of the SqlCommand()
object, and the query is executed to save the information to the database.

A quick SELECT query on the “items” table using
the SQL Query
Analyzer
tool confirms that the INSERT operation was a success, as shown in
Figure B.

Figure B

To see what else you can do with the XmlDocument()object, drop by the MSDN Library documentation page. And if RSS intrigues you, point
your browser to the RSS
2.0
documentation page for detailed specifications and tutorials.