Web Development

Parsing the News.com RSS feed with PHP

Taking an RSS feed like the one from News.com and turning it into content for your Web site is a breeze. We show you how with PHP's XML extensions.

RSS 2.0 is an XML vocabulary that provides a means for describing news and events so they can be shared across the Web in a simple and standardized way. Sites such as News.com offer feeds of their news articles that you can incorporate into your own Web site.

Creating the XML parser
So how do you incorporate RSS into a PHP Web site? Since RSS is XML-based, we can use PHP's XML libraries to handle the parsing of RSS data elements. The PHP XML extension is implemented via Expat, an XML parser written in the C language. Let's get started by looking at the code in Listing A.

In this example, I created a basic XML parser and then freed up the XML parser resource. This is a very simple example from which we'll build a small RSS application. The first line calls the xml_parser_create function, which does pretty much what its name implies—creates an XML parser. The last line of code, the call to the xml_parser_free function, frees up the XML parser from memory. Now that we have the basic creation and removal of the XML parser, we can proceed to greater things.

Element handlers
In this next example (Listing B), I've created some functions and made various XML function calls to set handlers in the parser. The three functions are used to handle specific events that occur when an element starts or stops and when there is character data.

The first function, startElement, will handle all the opening XML tags. Basically, when the parser comes to a new element (XML tag), the startElement function will receive that tag name as one of its parameters. The second parameter is the parser resource, and the third parameter contains the attributes associated with the corresponding element (tag). The attributes are contained in an array that I've named $attrs. As the parser passes the tags to this function, they're printed out again as XML tags (not a very productive function, but it illustrates what we want to do).

The endElement function does the same thing as startElement, except it deals with the closing tag of the XML elements. Not to be too exciting, I only print out the tag name again as an XML closing tag. Before this function is called by the parser, charElement is called after startElement because the character data is inside of the start and end elements.

In the charElement function, again I'm only printing out the data to the user. These functions play a vital role in the processing of XML elements, and you should familiarize yourself with how the parser works with each one. Though I'm only using functions in this simple example, you have the ability to create a class, with its associated functions, to deal with all your XML parsing needs. This gives flexibility and modularity to your applications.

Now that the element functions are defined, there are still some missing pieces. The parser needs to know which functions will be handling which elements. To specify which ones will do what, the xml_set_element_handler function will set the correlation between the parser and the handling function. The function takes three arguments: parser, start element handler, and end element handler. The last two arguments are strings that contain the names of the functions the parser will use as user callback functions. Instead of strings, an array containing an object reference and a method name can be used to assign the associated callback function to theparser.

When we set the xml_set_character_data_handler function, we're causing the XML parser to send all character data to the specified user function. This function accepts two arguments: The first is the parser resource, and the second is the user-defined callback function that will receive the character data from the XML parser.

The user callback function must also accept two arguments: parser resource and a data string. The parser resource is the parser defined by the xml_parser_create function. The xml_set_character_data_handler can also accept an object reference similar to that described above instead of the user callback function.

Now that we have everything set up and defined, we can move on to actually opening and parsing the RSS file. I use the fopen function to open and set a file pointer to the RSS resource, in this case a news feed from News.com. If the script can't open the file pointer, it simply stops. After I open the file pointer, I use a while loop to read in the RSS file. As I loop through the file data, I parse the XML data with the xml_parse function.

The xml_parse function takes three arguments; the first two are required while the last argument is optional. The first argument needs to be the parser resource; the second needs to be the data that is to be parsed. In the example in Listing B, I set the third argument to the feof function to test whether the end-of-file has been reached by the file pointer. If this returns True, the xml_parse function knows the second argument is the last chunk of data to be parsed for that given document or RSS file. After the parsing is complete, I close the opened file pointer and the XML parser, respectively.

Printing the results
Now that I have the RSS parsed by PHP, what next? I can set up the functions to print out only the information that I want out of the RSS file. Take a look at Listing C, a simple example of how to set up the functions to print only the title, link, and description. In the last example, you can get a feel for how I've set up the functions to work together to display specific elements that are desired for output to the user.
0 comments

Editor's Picks