Developer

Advanced site building with HTML::Mason

Mason, an HTML templating system written in Perl, can help you create complex Web sites. Find out how to use dhandler components for building dynamic pages and filters for post-processing.


By James Scheinblum

In the last installment of Perl Diving, we took a first look at Mason, an HTML templating system written in Perl. This time, we'll dig deeper into the functionality that makes Mason an ideal basis for complex Web sites. Refer to the previous article if you need a primer on installation and syntax.

This time we're going to step through the process of creating a well-structured modular Web site and introduce dhandler components for dynamic pages and filters for post-processing. We'll then look at generating XML versions of documents and caching for speed and performance.

In our examples, we will create a fictional site where every element is stored in a logical place that allows us to make changes to the design as quickly as possible. Our site will be plain, with a title across the top that indicates what portion of the site the user is looking at, a table of contents on the left side, and a common footer across the bottom. Starting with these small elements, we'll assemble a much larger site. Let's dig in and see how to do it.

Creating a modular Web site
You might not think of creating a Web site as object-oriented programming, but many of the same design techniques can ease the job of maintaining your site. Imagine your Web site as a collection of small parts, each easily manageable and understandable. Look at each page in your site and envision which parts are common across the site and which are specific to a page. Once you get comfortable with Mason, you'll find that you can create components for nearly everything.

Mason works best when you organize your site hierarchically, so we must first create a directory structure. For the structural portions in the site—such as the header, the footer, and the table of contents—we'll create an elements directory. We'll also want a directory to store the text of our site, so we'll create a content directory as well. With these two directories in place, we can create our first component, index.html, in the root of our directory structure:

 
<!— component /v2.0/index.html —>
<& /v2.0/elements/header &>
<table border=0 width=100%>
   <tr width=100% valign="top">
      <td width=20%>
         <& /v2.0/elements/toc &>
      </td>
      <td width=60%>
         <& /v2.0/content/mainblock &>
      </td>
      <td width=20%>
         <& /v2.0/pr/prlist &>
      </td>
   </tr>
</table>
<& /v2.0/elements/footer &>

As you can see in Figure A, we have a header component across the top, a footer component across the bottom, and a three-column layout in the middle. Each column calls another component for its content: the left contains a table of contents in a blue box, and the right holds a list of press releases. We'll address those when we get into dynamic components.

Figure A


In the header element, we want the title atop each page to be unique. So in the component header, we set the document title as an argument in order to let each page specify what its title should be:

 
<!— component /v2.0/elements/header —>
<html>
   <head>
      <title><% $pagetitle %></title>
      <& /v2.0/elements/header_js &>
  </head>
   <body bgcolor="<% $bgcolor %>">
   <p align="center">
      <font size="+3">
         <b><% $pagetitle %></b>
      </font size>
   </p>
<%args>
   $bgcolor=>"#cccccc"
   $company_name=>"MyWebSite.com"
   $pagetitle=>"Welcome to $company_name"
</%args>

The default title is "Welcome to" plus the company name. To change the title, we must call the header with the new title an argument, as on about.html:

 
<!— component /v2.0/about.html —>
<& /v2.0/elements/header, pagetitle=>"About us" &>
<table border=0 width=100%>
   <tr width=100% valign="top">
      <td width=20%></td>
      <td width=60%>
         This page says everything there is to be said
         and nothing more.
      </td>
      <td width=20%></td>
   </tr>
</table>
<& /v2.0/elements/footer &>

Finally, here is the footer component:

 
<!- component /v2.0/elements/footer.html —>
<br>
   <p align="center">
      <font size=1>copyright 2000-2001, <% $company_name %>.
      </font>
      <br>
      <a href="/mason/v2.0/index.html">back home</a>
   </p>
   </body>
</html>
<%args>
   $bgcolor=>"#ffffff"
   $company_name=>"MyWebSite.com"
</%args>

Once the site is finished, it will be easy to make simple changes, such as changing the copyright or the color of the background, for all pages. However, while building components in this way minimizes the effort in altering common portions of the site, such as in the header or the footer, it still requires us to create a top-level component such as index.html and about.html for every page. It would be much better to create a common template for all pages and dynamically apply it. Mason's dhandler functionality lets us do exactly that.

Dynamic dhandler components
We have seen how simple components can make managing large Web sites easier. Now let's see how we can build new pages automatically, without creating a new component for each new page. For example, suppose our fictional Web site puts out ten new press releases each week. We want each release to look consistent with the rest of the site, but we don't want to have to include all the formatting elements for each one. Mason provides us with the dhandler functionality, which is essentially a dynamically generated directory that can create the press release pages. To the end user, such a page looks like just another page on the site, but behind the scenes, Mason assembles this page from a variety of sources.

When Mason receives an HTTP request for which no corresponding component exists in the directory structure, it will look for a component called dhandler. Mason looks up the directory tree until it either finds a dhandler or until it reaches the top. For example, if the request is for /mason/v2.0/pr/pr001 and no pr001 component exists in /mason/v2.0/pre, Mason looks first for /mason/v2.0/pr/dhandler, then /mason/v2.0/dhandler. It executes the dhandler component and passes it the trimmed portion of the URL as an argument.

Below is a dhandler that formats a press release. We can place as many press releases as we want in the /var/mason/comp/v2.0/pr/prfiles directory as text files, which the dhandler will open and reformat into Web pages.

 
<!— component /v2.0/pr/dhandler —>
<& /v2.0/elements/header, pagetitle=>"Press Release Archive" &>
<table border=0 width=100%>
<tr width=100% valign="top">
<td width=20%>
<& /v2.0/elements/toc &>
</td>
<td width=60% bgcolor="#ffffff">
<font size=+2>Press Release</font>
<& /v2.0/pr/formatpr, file=>$m->dhandler_arg &>
</td>
<td width=20%>
<& /v2.0/pr/prlist &>
</td>
</tr>
</table>
<& /v2.0/elements/footer &>

And here is the text formatting component formatpr that the dhandler calls. It receives the clipped portion of the request URL (pr001 in our example), loads a text file of the same name, and applies a regular expression:

 
<!— component /v2.0/pr/formatpr —>
<PRE>
%   open(FILE,$file) ;
%   while (<FILE>) {
%      s/n/<br>/g;
      <% $_ %>
%    }
%   close(FILE);
</PRE>
<%INIT>
   ($file) = "/var/mason/comp/v2.0/pr/prfiles/$file";
</%INIT>
<%ARGS>
   $file=>undef
</%ARGS>

With this technology, we no longer need to create new components for new pages. Instead, we simply place new text files in the prfiles directory, and the component formats them into Web pages.

Now that we've seen how dhandlers apply a template to a large number of files, let us now look at how to change the output of a component on the fly by using filters.

Filters for post-processing
Any component, whether it's a regular component or a dhandler, can contain a filter section that operates after the component executes but before the result is sent back to the browser. Filters are an easy way to alter the content of the component on the fly. In our press release example, we could use a filter to replace all occurrences of a certain word or to ensure that certain words always appear in boldface. In this example, we'll use filters to dynamically adjust the left-hand table of contents that sits on every page so that the current page is in boldface and the rest are links, as shown in Figure B.

Figure B


To generate this dynamic menu, we combine a dhandler with a filter. But instead of creating a giant if-then-else statement to figure out which page we're looking at, we grab the name of the page from the requested URL and use that to search and replace the list of links, changing the current page from a link to boldface. Here's a dhandler that calls the toc component:

 
<!— component /v2.0/dyn/dhandler —>
<& /v2.0/elements/header &>
<table border=0 width=100%>
   <tr width=100% valign="top">
      <td width=20%>
         <& /v2.0/elements/toc &>
      </td>
      <td width=60%>
         <% mc_comp("/v2.0/content/".$comp) %>
      </td>
      <td width=20%>
      </td>
   </tr>
</table>
<%init>
   my ($comp) = $m->dhandler_arg;
</%init>
<& /v2.0/elements/footer &>

And here's the toc (table of contents) component itself, with the filter:

 
<!— component /v2.0/elements/toc —>
<table border=0 bgcolor="#000000" width=100%>
   <tr width=100%>
      <td width=100%>
         <table border=0 bgcolor="#ccccff" width=100%>
%            foreach my $page (@pages) {
            <tr>
               <td>
               <a href="/mason/v2.0/dyn/<% $page %>"><% $page %></a>
               </td>
            </tr>
 
%            }
         </table>
      </td>
   </tr>
</table>
<%filter>
   my ($path) = $m->dhandler_arg;
   s#<a href="/mason/v2.0/dyn/$path">($path)</a>#<b>$1</b>#ig;
</%filter>
<%init>
   my (@pages) = (   "About Us", "Products",
         "Customer Support", "Contact us");
</%init>

The toc component builds the list of links by looping through the @pages array defined in the <%init> section. The dhandler that calls this component passes the name of the page being requested as $comp and executes that component later. After the toc component builds the HTML, the filter looks to see which page was requested and replaces the HTML link with the name of the page in boldface. The effect is simple, and the code, while a little daunting, is a good example of Mason's power.

Generating XML documents
With Mason's power to give you control over your Web site, you may be eager to start adopting new technologies. One easy technology to embrace with Mason is XML. Since XML documents are essentially like HTML documents, you'll find them easy to build and publish. To demonstrate, let's look again at our press release example. This time, we want to syndicate our press releases using the RDF Site Summary (RSS) syndication format so that other Web sites know when we update our press releases. We'll make a component prlist.xml, which generates an XML file compliant to the RSS specification and sends it out over HTTP:

 
<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="0.91">
   <channel>
      <title>MyWebSite</title>
      <link>http://www.MyWebSite.com</link>
      <description>My Mason Web site</description>
      <language>en-us</language>
      <copyright>Copyright 2000, this Web site</copyright>
      <managingEditor>webmaster@mywebsite.com</managingEditor>
      <webMaster>webmaster@mywebsite.com</webMaster>
%      opendir(DIR,"/var/mason/comp/v2.0/pr/prfiles/");
%      foreach my $dir (grep {!/^./} readdir(DIR)) {
%         my $d = uri_escape($dir);
      <item>
        <title><% $dir %></title>
         <link>http://website.com/mason/v2.0/pr/<% $d %></link>
         <description><% $d %></description>
      </item>
%            }
   </channel>
</rss>
<%init>
   use URI::Escape;
   my ($file) = $m->dhandler_arg;
   ($file) = "/var/mason/comp/v2.0/pr/prfiles/$file";
   $r->content_type("text/xml");
</%init>

When building XML files, make sure you stay compliant to the XML specification. That means the first line in the component must be the XML declaration, and you must set the correct content type. Mason sets content types at the last possible moment, which gives components plenty of time to designate one. Beware, however, that components you call from your XML component may set the content type to something other than text/xml, so be sure that you know what the components you call are doing. Lastly, because Mason is designed for HTML files, in which white space is not an issue, you must make sure that you do not introduce extraneous white space into your documents by accident—for example, before the XML declaration.

That said, Mason can support many XML technologies. It makes an ideal solution for building wireless Web sites in WML (Wireless Markup Language).

Caching for performance
The last bit of Mason that we'll look at is the caching architecture. Mason has a simple yet powerful mechanism for caching the output of a component. Caching speeds up component execution, especially when you have a slow data source, such as a database or a network drive, to read from. Mason installs with caching ready to go, and all you need to do is decide where to implement it. You can't, however, implement caching everywhere, despite the urge to do so. It is best to cache the output of small components rather than the output of one component calling many smaller components.

Each cache entry has an expiration period—the amount of time that it can exist in the cache before it's determined to be out of date—and a key by which to recall the cached data. What you cache is up to you. It can be virtually any Perl data, from data structures to the output of an entire component.

To demonstrate how you can cache the output of an entire component and to prove how much faster caching can make your requests, we'll create a component that calculates the mathematical factorial value of a number (for example, the factorial of 5 would be the result of 5 x 4 x 3 x 2 x 1). Here is the code for the component.

 
<!— component /util/factorial —>
<%perl>
   my $returnValue = 1;
   for(my $i = 2; $i <= $number; $i++) {
      $returnValue *= $i;
   }
   $fact = $returnValue;
</%perl>
<h1>The Factorial of <% $number %> is <% $fact %></h1>
<%init>
   return if $m->cache_self(expire_in=>'30 seconds');
</%init>
<%args>
   $number => 1000000
   $fact => undef
</%args>

In our test run, the execution of this component takes 7.946 seconds to execute the first time and 0.539 seconds after being cached. Both times include document rendering, proving that caching is an efficient way to speed up your slower components. This is particularly useful if you need to periodically build a page based on a complex database query.

By now you should be ready to take advantage of Mason for building your next complex Web site. It lets you modularize your site into manageable components, and it allows you to create dynamic content with dhandlers and filters. There is much more that Mason has to offer; just stop by the Mason Web site for more information. Good luck, and happy coding!

Editor's Picks

Free Newsletters, In your Inbox