Over the years, XML has become an important technology, because of its unrivalled ability to mark up content and make it more useful. Almost all modern development platforms now provide some kind of native support to read and parse XML documents. This includes ASP.NET, which comes with a handy XmlDocument() class to parse an XML document tree using the DOM.
Over the course of this article, I will give you a crash course in the XmlDocument() object, showing you how to create a tree representation of an XML document in memory and traverse it, and then use this knowledge to parse Builder.com's own RSS feed and save the items within it to a Microsoft SQL Server database.
Faster, higher, stronger
The XML file I'll be using throughout this tutorial is a list showing the number of Olympic medals—gold, silver and bronze—won by the athletes of the competing countries:
<country name="USA" rank="1">
<country name="China" rank="2">
<country name="Russia" rank="3">
<!— more countries here.. —>
Now, let's look at Listing A to see how to parse this file using the XmlDocument() object.
While this isn't exactly riveting reading, the output does give you an idea about the tree-like structure of the DOM (see Figure A).
Most of this tree generation happens through the XmlDocument() object, whose Load() method reads the contents of the XML file into memory. Once the file is loaded, the XmlDocument() object builds a DOM tree that begins with the "root element" (the main <countries> elements) and progresses hierarchically to its children.
To begin traversing the tree, it is necessary to first obtain a reference to the root element, through the "DocumentElement" property of the XmlDocument() object. This property returns the required reference in the form of an XmlElement() object, that in turn exposes properties containing information about the name and data stored by the node, and methods to access the next level of the tree.
Since each node exposes methods to get to the next level of the tree, a common way to traverse a DOM tree is write a function that runs recursively until all the nodes are processed. In the example above, this function is called ReadXmlFile(), and it accepts an XmlNode() object as input parameter. The "NodeType" property of this object is then used to determine the type of node—element, attribute, comment or text—and its name and contents are displayed. For attributes, the "Name" and "InnerText" properties of this XmlAttribute() object are used to obtain the name-value pair for each attribute; for character data, the "Value" property is used to obtain the text content of the node.
The "ChildNodes" property is then used to test if the node has children. If it does, an XmlNodeList() object is returned, and a for() loop is used to pass each child back to the custom ReadXmlFile() function for further processing. The integer variable "intLevel" monitors the current depth in the XML file as the recursion progresses. Recursion stops once the final element is reached and no further children exist.
Builder.com on your Web site
The steps in the previous example make up a fairly standard process to traverse an XML document tree, from its root element to its closing element. Next, let's look at a more practical example. Consider Really Simple Syndication (RSS), which allows a Web master to propagate the content of his/her Web site using an XML "feed." With the ASP.NET XmlDocument() object, it becomes easy to write an RSS client to access such a feed and use the information inside it on your own Web site. This is precisely what the next example will demonstrate—how to use the XmlDocument() object to extract information from a remote RSS feed into a local database. Once it is in the database, it can be accessed for display or search purposes.
What I need now is a script that reads this RSS feed, parses it to extract the headlines, and converts the data inside it to INSERT statements suitable for use with MS-SQL. The code is in Listing C.
In this script, since I'm primarily interested in the headlines, my Page_Load() function doesn't waste time traversing each and every element. Instead, it uses the GetElementsByTagName() method of the XmlDocument() object to filter the tree down to only those elements named <item>, and their children. The return value is an XmlNodeList(), which is passed to the custom storeRSSFeed() function for further processing.
The storeRSSFeed() function first creates instances of the objects required for database access, and then iterates over the supplied XmlNodeList() collection to access each child. The "InnerText" property is used to populate the appropriate parameter of the SqlCommand() object, and the query is executed to save the information to the database.
A quick SELECT query on the "items" table using the SQL Query Analyzer tool confirms that the INSERT operation was a success, as shown in Figure B.
To see what else you can do with the XmlDocument()object, drop by the MSDN Library documentation page. And if RSS intrigues you, point your browser to the RSS 2.0 documentation page for detailed specifications and tutorials.