Software

How do I... Use the .NET Framework to consume RSS feeds?

Many Web sites are now offering their content via XML-based RSS feeds. This allows automated processes to collect content from many sites and aggregate the data into a single data store. Zach Smith explains how to use the built-in functionality of the .NET Framework to collect and display RSS feed information. A sample application is also included, which implements the code shown in this article.

This article is also available as a TechRepublic download, which includes a sample application implementing the code as shown in the article.

The process of consuming an RSS feed with the .NET Framework is not complicated. The steps involved are listed below:

  1. Connect to the site offering the RSS feed.
  2. Download the feed XML.
  3. Load the feed's XML into an object that allows searching.
  4. Search the feed's XML for the nodes you want to extract.

The .NET Framework provides built-in functionality to accomplish all of these tasks. All we need to do is tie the functionality together and we'll be able to consume RSS feeds.

Connecting to the server

To connect to the server, we will use the WebRequest object. The WebRequest object allows you to post requests to Web sites, and since RSS is transferred via HTTP, the WebRequest object is the most obvious choice for connecting to the server.

The code in Listing A demonstrates how to instantiate a new WebRequest object connected to a URL.

Listing A

//Create a WebRequest object
WebRequest myRequest = WebRequest.Create(url);

In this case, "url" would be replaced with the full URL of the RSS feed. An example is the MSN Automotive RSS feed, which is located at:

http://rss-feeds.msn.com/autos/autosnews.xml

Downloading the RSS data

After we have connected to the server, we need to download the data that the feed provides. The WebRequest object provides a method specifically for this purpose called GetResponse(). WebRequest.GetResponse() returns a WebResponse object, which gives us access to the server's response to our request.

The method we will use from the WebResponse object is GetResponseStream(). This method returns a Stream object, which will contain the raw RSS XML that the server responded with. Code Listing B demonstrates how to get a WebResponse object from the WebRequest object and how to get the response stream from the WebResponse object.

Listing B

//Get the response from the WebRequest
WebResponse myResponse = myRequest.GetResponse();

//Get the response's stream
Stream rssStream = myResponse.GetResponseStream();

Loading the RSS data into an XML document

Once we have the stream from the WebResponse object, we can load that stream into an XmlDocument object. This will allow us to easily parse the XML data and extract values from it. The easiest way to get an XmlDocument to load our Stream is to instantiate a new XmlDocument object and pass our Stream to the Load method. The code in Listing C demonstrates this.

Listing C

//Create the Xml Document
XmlDocument document = newXmlDocument();

//Load the stream into the XmlDocument object.
document.Load(rssStream);

Parsing the XML

This is the hardest part of consuming the RSS feed. We must use the XmlDocument we created in the previous section to parse out the specific XML nodes that contain our data. The most common nodes of interest are:

  • The feed's title, which is located at /rss/channel/titlewithin the feed XML.
  • The feed's articles, which are located at /rss/channel/item within the feed XML. There can be multiple nodes at this location.
  • An article's title, which is located at titlewithin an article node.
  • An article's description, which is located at descriptionwithin an article node.
  • An article's link, which is located at linkwithin an article node.

To get to these nodes, we will use the XmlDocument object's built-in SelectSingleNode and SelectNodes functionality. Both of these functions accept XPath queries and return the node (or nodes) that match the given query.

The code shown in Listing D demonstrates how to parse out each individual element from the RSS feed using an XmlDocument and Xpath.

Listing D

//Get an XmlDocument object that contains the feed's XML
XmlDocument feedDocument =
GetXmlDocumentFromFeed("http://rss-feeds.msn.com/autos/autosnews.xml");

//Create a XmlNamespaceManager for our namespace.
XmlNamespaceManager manager =
newXmlNamespaceManager(feedDocument.NameTable);

//Add the RSS namespace to the manager.
manager.AddNamespace("rss", "http://purl.org/rss/1.0/");

//Get the title node out of the RSS document
XmlNode titleNode =
feedDocument.SelectSingleNode("/rss/channel/title", manager);

//Get the article nodes
XmlNodeList articleNodes =
feedDocument.SelectNodes("/rss/channel/item", manager);

//Loop through the articles and extract
// their data.
foreach (XmlNode articleNode in articleNodes)
{
//Get the article's title.
string title =
articleNode.SelectSingleNode("title", manager).InnerText;

//Get the article's link
string link =
articleNode.SelectSingleNode("link", manager).InnerText;

//Get the article's description
string description =
articleNode.SelectSingleNode("description", manager).InnerText;
}

Not all RSS feeds are created equal

While it would be great if all RSS feeds were the same format, there are many different versions and implementations of RSS feeds. Many feeds have the format described in this article, while others are slightly different. For more information on RSS formats, check out this excerpt from O'Reilly.

Editor's Picks

Free Newsletters, In your Inbox