Developer

Debugging mod_perl with a symbol audit

Debugging mod_perl can seem like a lesson in programming alchemy. Find out how to leverage symbol audits to debug more effectively.


Why is it that some perfectly good Perl scripts don't work when used in the mod_perl environment of Apache? It seems that rebooting the Web server and trying again becomes the standard solution. This article peeks inside the mod_perl environment using a symbol table report and reveals a source of Perl problems specific to mod_perl.

Download the code
You can grab the code for this article here.

Understanding symbols
Perl stores all your variable names in symbol tables. A symbol table is basically a hash that links the names to the actual chunks of data in the names. When you declare a package namespace, usually by importing some module via require or use, the Perl interpreter just creates an additional symbol table, and there is one symbol table per package.

Since Perl isn't a highly structured language like C++, all it really boasts is the package keyword, the basis of modules. For example, you can create a module named People.pm. Next, you can write more specialized detail in People::Entertainers. Finally, you can drill down to People::Entertainers::Singers. That's a total of three packages and three symbol tables. This hierarchy can go as deep as you like.

Hiding information in Perl
Perl wasn't designed to hide information. Unlike C/C++, there is no static keyword. You can access any variable in any package from anywhere if you just use the right prefix. Some modules employ this feature to supply you with literal values, perhaps like AlarmState::Minor. Most of the time, there's little need to access other packages' symbol tables. Digging inside a package is unnecessary if you just want to use its features. Nevertheless, it is technically possible.

Symbol tables are essential to mod_perl because symbol tables and packages keep Perl scripts separated inside the Apache server. That’s how mod_perl came to be. The symbol tables are all fully accessible (because it's Perl), yet the Apache httpd processes are reused many times over by different CGI scripts. These two opposite arrangements—open access on one hand and separateness on the other—can lead to confusion inside mod_perl. If you can see the symbols, you can tell if mod_perl is behaving correctly.

Using Devel::Symdump and Apache::ROOT
The Devel::Symdump module (not installed by default) contains routines that rip through all the symbol tables and extract a list of symbols for you to inspect. It's just a diagnostic tool. The symbols might be subroutines, scalars, arrays, package names, whatever. This module reveals the insides of mod_perl.

Listing A briefly uses this module. Our naming standard uses an .fgi extension for CGI programs that use mod_perl (the f equals fast way) and a .cgi extension for CGI programs that don't use mod_perl. There's not much in this script except a bit of code that pretty-prints the symbols so that they're more readable.

If you run this script directly from the command line (or audit.cgi—it's the same), you'll see that just by including the CGI.pm module, you get a rather large list of available symbols. If you run the .cgi version behind the server, the result is the same, although the browser makes reading the output HTML easier. If, however, you run the .fgi as a program called from a Web page, so that mod_perl is at work, the list of symbols will be massively larger. Mod_perl keeps stored in its head vast amounts of information, and it's all preserved between script invocations. If that information goes awry, nothing's going to work. Try running the .fgi script using different methods noted in the Devel::Symdump man page. You need to change only this line:
 
@sym_list = $all_syms->scalars;
 

If you trawl through the list of symbols, you'll see that there is an Apache module with a ROOT submodule (hence, Apache::ROOT). Each Perl script that's run via mod_perl is allocated a separate module underneath this point in the module hierarchy. That's the only separation between scripts.

Spotting problem variables
The code download also includes two trivial scripts, sampleA.fgi and sampleB.fgi. The only difference is that one uses local and one uses my. If you direct the browser to both of these scripts and then run the original audit.fgi, part of the output might appear as that shown in Figure A.

Figure A


You can see in this sample output that a submodule has been constructed for each of the three .fgi scripts, in each case named after the script: for instance, audit_2efgi. The 2e part is the hexadecimal for ASCII period (full stop). It’s the different treatment of the variables that you may find interesting. Local variables (declared in sampleA.fgi) are listed, but my variables aren't. That means local variables (such as $html) will survive, content intact, until the next time you run the script. Unless you're very careful to initialize everything every time, that's an error-prone arrangement. Who knows what unexpected junk your script might receive the next time it's run?

The submodule for audit.fgi also shows scalars a and b. They appear because they are used casually, without a proper my declaration, inside the list-sorting code in audit.fgi. They slipped in unannounced. Again, this is a situation prone to error. These variables can also survive changes to the code.

Suppose you temporarily add a scalar (or a subroutine, module, or array) to your code for debugging. Once run under mod_perl, that variable or subroutine will hang around inside Apache for all subsequent executions of the script, even after you've deleted it again from the source. Say you entirely replace the script with another, but you give the new script the same filename. When the replacement is run, all the old junk will be automatically added back into the run-time environment. Now you know why restarting the server helps—no stored information can survive killing the httpd.

In the best of possible worlds, mystery variables are caught when you use the –w option or use strict. That's always recommended. The problem is that when you're shoulder-deep in a debug session, it's easy to slip out of strict mode or make ugly, temporary changes. Do that just once, and the consequences hang around inside mod_perl afterward.

All this stored information can bite you even worse if you try to be too clever. Suppose you know that in stand-alone Perl, you can create top-level variables with a :: prefix. Under mod_perl, the top level isn't the level you expect, because of the Apache::ROOT prefix, so your new variable will hang around somewhere you didn't expect. That might sound like a neat way to do session identification, but if Apache is set up to age its servers over time (the default), those variables won't last indefinitely.

Worst of all, you can refer to a symbol for which there is no script currently running. In short, if you jump around between module namespaces while inside mod_perl, you're just asking for trouble.

Summary
When you're at the early testing stage and little works, you don't have much choice. You'll have to restart the Web server regularly if your script messes up the mod_perl environment. Alternatively, you can turn off mod_perl and take a performance hit until your script is stable. Once your script is stable, though, an audit of what's lingering around inside the server is a good way to test how clean your code is. You can do more than look at symbol names too: You can explore the state of the leftover data to any level of detail.

Editor's Picks