Developer

Parsing the Builder AU RSS feed with PHP

Learn how to use RSS and PHP to incorporate Builder AU content into your own Web site.

RSS 2.0 is an XML vocabulary that provides a means for describing news and events so they can be shared across the Web in a simple and standardised way. Sites such as Builder AU and ZDNet Australia offer feeds of their news articles that you can incorporate into your own Web site.

Creating the XML parser
So how do you incorporate RSS into a PHP Web site? Since RSS is XML-based, we can use PHP's XML libraries to handle the parsing of RSS data elements. The PHP XML extension is implemented via Expat, an XML parser written in the C language. Let's get started by looking at the code below:

<?php
// Create an xml parser
$xmlParser = xml_parser_create();

// Free xml parser form memory
xml_parser_free( $xmlParser );
?>

In this example, I created a basic XML parser and then freed up the XML parser resource. This is a very simple example from which we'll build a small RSS application. The first line calls the xml_parser_create function, which does pretty much what its name implies—creates an XML parser. The last line of code, the call to the xml_parser_free function, frees up the XML parser from memory. Now that we have the basic creation and removal of the XML parser, we can proceed to greater things.

Element handlers
In this next example shown in the code below, I've created some functions and made various XML function calls to set handlers in the parser. The three functions are used to handle specific events that occur when an element starts or stops and when there is character data.

<?php
// function: startElement
// Deals with the starting element
function startElement( $parser, $tagName, $attrs ) {
echo "<". strtolower( $tagName ) . ">";
}

// function: endElement
// Deals with the ending element
function endElement( $parser, $tagName ) {
echo "</" . strtolower( $tagName ) ."><br>";
}

// function: charElement
// Deals with the text in between tags
function charElement( $parser, $text ) {
echo "$text";
}

// Create an xml parser
$xmlParser = xml_parser_create();

// Set up element handler
xml_set_element_handler( $xmlParser, "startElement", "endElement" );

// Set up character handler
xml_set_character_data_handler( $xmlParser, "charElement" );

// Open connection to RSS XML file for parsing.
$fp = fopen( "http://www.builderau.com.au/feeds/features.htm", "r" )
or die( "Cannot read RSS data file." );

// Parse XML data from RSS file.
while( $data = fread( $fp, 4096 ) ) {
xml_parse( $xmlParser, $data, feof( $fp ) );
}

// Close file open handler
fclose( $fp );

// Free xml parser from memory
xml_parser_free( $xmlParser );
?>

The first function, startElement, will handle all the opening XML tags. Basically, when the parser comes to a new element (XML tag), the startElement function will receive that tag name as one of its parameters. The second parameter is the parser resource, and the third parameter contains the attributes associated with the corresponding element (tag). The attributes are contained in an array that I've named $attrs. As the parser passes the tags to this function, they're printed out again as XML tags (not a very productive function, but it illustrates what we want to do).

The endElement function does the same thing as startElement, except it deals with the closing tag of the XML elements. Not to be too exciting, I only print out the tag name again as an XML closing tag. Before this function is called by the parser, charElement is called after startElement because the character data is inside of the start and end elements. In the charElement function, again I'm only printing out the data to the user. These functions play a vital role in the processing of XML elements, and you should familiarise yourself with how the parser works with each one. Though I'm only using functions in this simple example, you have the ability to create a class, with its associated functions, to deal with all your XML parsing needs. This gives flexibility and modularity to your applications.

Now that the element functions are defined, there are still some missing pieces. The parser needs to know which functions will be handling which elements. To specify which ones will do what, the xml_set_element_handler function will set the correlation between the parser and the handling function. The function takes three arguments: parser, start element handler, and end element handler. The last two arguments are strings that contain the names of the functions the parser will use as user callback functions. Instead of strings, an array containing an object reference and a method name can be used to assign the associated callback function to theparser.

When we set the xml_set_character_data_handler function, we're causing the XML parser to send all character data to the specified user function. This function accepts two arguments: The first is the parser resource, and the second is the user-defined callback function that will receive the character data from the XML parser.

The user callback function must also accept two arguments: parser resource and a data string. The parser resource is the parser defined by the xml_parser_create function. The xml_set_character_data_handler can also accept an object reference similar to that described above instead of the user callback function.

Now that we have everything set up and defined, we can move on to actually opening and parsing the RSS file. I use the fopen function to open and set a file pointer to the RSS resource, in this case a news feed from Builder AU. If the script can't open the file pointer, it simply stops. After I open the file pointer, I use a while loop to read in the RSS file. As I loop through the file data, I parse the XML data with the xml_parse function.

The xml_parse function takes three arguments; the first two are required while the last argument is optional. The first argument needs to be the parser resource; the second needs to be the data that is to be parsed. In the example in the listing below, I set the third argument to the feof function to test whether the end-of-file has been reached by the file pointer. If this returns True, the xml_parse function knows the second argument is the last chunk of data to be parsed for that given document or RSS file. After the parsing is complete, I close the opened file pointer and the XML parser, respectively.

Printing the results
Now that I have the RSS parsed by PHP, what next? I can set up the functions to print out only the information that I want out of the RSS file. Take a look at the code below, which is a simple example of how to set up the functions to print only the title, a link to view the article link, and description. In the last example, you can get a feel for how I've set up the functions to work together to display specific elements that are desired for output to the user.

<?php
// Global variables for function use.
$GLOBALS['title'] = false;
$GLOBALS['link'] = false;
$GLOBALS['description'] = false;
$GLOBALS['item'] = false;
$GLOBALS['titletext'] = null;
$GLOBALS['linktext'] = null;
$GLOBALS['desctext'] = null;

// function: startElement
// Deals with the starting element
function startElement( $parser, $tagName, $attrs ) {
// By setting global variable of tag name
// I can determine which tag I am currently
// parsing.
switch( $tagName ) {
case 'ITEM':
$GLOBALS['item'] = true;
break;
case 'TITLE':
$GLOBALS['title'] = true;
break;
case 'LINK':
$GLOBALS['link'] = true;
break;
case 'DESCRIPTION':
$GLOBALS['description'] = true;
break;
}
}

// function: endElement
// Deals with the ending element
function endElement( $parser, $tagName ) {
// By noticing the closing tag,
// I can print out the data that I want.
switch( $tagName ) {
case 'TITLE':
if( $GLOBALS['item'] == true ) {
echo "<p><b>" . $GLOBALS['titletext'] . "</b><br/>";
}
$GLOBALS['title'] = false;
$GLOBALS['titletext'] = "";
break;
case 'LINK':
if( $GLOBALS['item'] == true ) {
echo "<a href=\"". $GLOBALS['linktext'] . "\">View Article</a><br/>";
}
$GLOBALS['link'] = false;
$GLOBALS['linktext'] = "";
break;
case 'DESCRIPTION':
if( $GLOBALS['item'] == true ) {
echo " " . $GLOBALS['desctext'] . "</p>";
}
$GLOBALS['description'] = false;
$GLOBALS['desctext'] = "";
break;
}

}

// function: charElement
// Deals with the character elements (text)
function charElement( $parser, $text ) {
// Verify the tag that text belongs to.
// I set the global tag name to true
// when I am in that tag.
if( $GLOBALS['title'] == true ) {
$GLOBALS['titletext'] .= htmlspecialchars( trim($text) );
} else if( $GLOBALS['link'] == true ) {
$GLOBALS['linktext'] .= trim( $text );
} else if( $GLOBALS['description'] == true ) {
$GLOBALS['desctext'] .= htmlspecialchars( trim( $text ) );
}
}

// Create an xml parser
$xmlParser = xml_parser_create();


// Set up element handler
xml_set_element_handler( $xmlParser, "startElement", "endElement" );


// Set up character handler
xml_set_character_data_handler( $xmlParser, "charElement" );


// Open connection to RSS XML file for parsing.
$fp = fopen( "http://www.builderau.com.au/feeds/features.htm", "r" )
or die( "Cannot read RSS data file." );

// Parse XML data from RSS file.
while( $data = fread( $fp, 4096 ) ) {
xml_parse( $xmlParser, $data, feof( $fp ) );
}


// Close file open handler
fclose( $fp );


// Free xml parser from memory
xml_parser_free( $xmlParser );

?>

And this is a very simple example, where we are displaying all the articles that have the ITEM tag associated with them. You can also extend this example in a number of different ways. For example, you could place the RSS feed text into a table with a header or you could use a counter to determine the number of items to display— it is up to you. In any case, however you decide to use the feed the parsing is the easy part using some of the techniques we have looked at here.

Editor's Picks

Free Newsletters, In your Inbox