RSS, also known as Really Simple Syndication or RDF Site Summary, is a file format which allows Web sites to publish and syndicate the latest content on their site to their users. An RSS "feed" is expressed in XML; and as a result, it can be read by any client capable of parsing an XML file. A number of such RSS clients exist, for both Windows and Linux, and the latest versions of Mozilla Firefox and Internet Explorer also allow you to subscribe to particular RSS feeds, to ensure that you always have the latest information at your fingertips.
Like any good programming language, PHP supports reading and creating RSS feeds, via the PEAR XML_RSS package. This package is a pre-built code library that allows you to dynamically retrieve and parse one or more RSS feeds. It's particularly useful if you need to extract information from an RSS feed and convert it into another format (for example, a MySQL database or a text file), or if you plan to create a customized Web page that aggregates information from multiple RSS sources.
In this document, I'll deal with the latter case, showing you how to use the PEAR XML_RSS package to build a simple RSS client that integrates news headlines from multiple RSS feeds into a single Web page. I'll assume throughout that you have a working Apache and PHP installation, and that you have successfully downloaded and installed the PEAR XML_RSS package and its dependencies.
Note: You can install the PEAR XML_RSS package directly from the Web, either by downloading or by using the instructions.
Getting started
Let's begin with a simple example, one that illustrates how XML_RSS works. Create the following script (Listing A):
Listing A
<?php// include class
include ("RSS.php");
// download and parse RSS data
$rss =& new XML_RSS("http://techrepublic.com.com/5150-22-0.xml");
$rss->parse();
// print headlines
print_r($rss->getItems());
?>
Here, the script reads the class definition and then instantiates a new XML_RSS() object. The object constructor is passed the URL of the source data - in this case, TechRepublic's RSS feed. Next, the parse() method is invoked to parse the XML and extract information from it. Finally, the getItems() method returns a neatly-organized nested array of the news items extracted from the feed. Each item has a title, a description, a publication date and the URL corresponding to the complete article, as illustrated in the output below (Listing B):
Listing B
Array(
[0] => Array
(
[title] => Bump the size of your information store to 75GB (Exchange 2003 Standard Edition only)
[link] => http://techrepublic.com.com/5100-1035_11-6063252.html?
part=rss&tag=feed&subj=tr
[description] => In Service Pack 2, the Exchange developers
have provided you with the ability to size the information store to any size you like between 1 and 75 GB, and they chose 18GB as the
default. Here's how to change the size yourself.
[pubdate] => Fri, 21 Apr 2006 00:00:00 PDT
)
[1] => Array
(
[title] => Learn the pros and cons of Windows Firewall
[link] => http://techrepublic.com.com/5100-1009_11-6063367.html?
part=rss&tag=feed&subj=tr
[description] => Is Windows Firewall up to the task of securing your network? Mike Mullins has
his doubts. In this edition of Security Solutions, he delves into the details of Windows Firewall and weighs its pros and cons.
[pubdate] => Thu, 20 Apr 2006 13:25:00 PDT
)
...
)
It's also possible to extract meta-information about the feed itself, by replacing the call to getItems() with a call to getChannelInfo(). As the name suggests, this method returns information on the feed itself, including a title and description (if available). Here's the code (Listing C):
Listing C
<?php// include class
include ("RSS.php");
// download and parse RSS data
$rss =& new XML_RSS("http://techrepublic.com.com/5150-22-0.xml");
$rss->parse();
// print channel information
print_r($rss->getChannelInfo());
?>
And here's a sample of the output (Listing D):
Listing D
Array(
[title] => TechRepublic.com
[link] => http://www.techrepublic.com/
[description] => Real World. Real Time.Real IT.
)
Working with a single feed
As the previous examples show, XML_RSS does a fairly good job of parsing an RSS feed and converting it into a PHP array. Once this array has been generated, it's quite easy to process it into a format suitable for display on a Web site. This next example illustrates (Listing E):
Listing E
<html><head></head>
<body>
The latest from TechRepublic: <p />
<ul>
<?php
// include class
include ("RSS.php");
// download and parse RSS data
$rss =& new XML_RSS("http://techrepublic.com.com/5150-22-0.xml");
$rss->parse();
// print channel information
foreach ($rss->getItems() as $item) {
echo "<li><a href=\"" . $item['link'] . "\">" . $item['title'] . "</a><br />";
echo $item['description'] . " (" . $item['pubdate'] . ") <p />";
}
?>
</ul>
</body>
</html>
In this case, the array returned by getItems()is processed using a foreach() loop. Each element of the array is itself an array, with elements for the story headline, URL, description and publication date. These elements are extracted and formatted as elements of an unordered HTML list. Figure A shows you a sample of what it looks like:
Figure A |
![]() |
| Array elements |
Working with multiple feeds
Why stop with just one feed? A little creative coding and you can add as many feeds as you like! Listing F shows you the code:
Listing F
<html><head></head>
<body>
<?php
// include class
include ("RSS.php");
// set up array of RSS feeds
$feeds = array( "http://techrepublic.com.com/5150-22-0.xml",
"http://news.linux.com/news.rss",
"http://rss.slashdot.org/Slashdot/slashdot");
// retrieve each feed
// get channel information and headlines
foreach ($feeds as $f) {
$rss =& new XML_RSS($f);
$rss->parse();
$info = $rss->getChannelInfo();
$items = $rss->getItems();
// print channel information
?>
<b>The latest from <a href="<?php echo $info['link']; ?>"><?php echo $info['title']; ?></a></b>:
<p />
<ul>
<?php
// print headlines and descriptions
foreach ($items as $item) {
echo "<li><a href=\"" . $item['link'] . "\">" . $item['title'] . "</a><br />";
echo $item['description'] . "<p />";
}
?>
</ul>
<p />
<?php
}
?>
</body>
</html>
The revision to the previous example is both simple and obvious. Rather than hard-wiring the URL to the feed in the object constructor, I've created an array containing URLs to different feeds, and processed this array using a loop. Each iteration of the loop creates a new XML_RSS object with a different source feed; this feed is then processed in the usual way, by calling parse() and getItems(). An additional enhancement is the use of the getChannelInfo() method discussed previously to dynamically print the name and URL of the feed at the top of each headline list.
Here's a sample of what the output might look like (Figure B):
Figure B |
![]() |
| More than one RSS feed |
Of course, you can alter this structure to reflect your needs more closely. For example, the script will currently display all the headlines in each feed; you can alter this to display only the top five headlines (say) from each feed, by using a for() loop and a counter at the second nesting level. You could also reformat the page layout to display news headlines in a drop-down list, thereby allowing a different type of navigation. Play with it a little, and have fun!





