Open Source

Put Apache to work as a reliable logging tool

Apache is not just for serving up content--it has myriad other uses as well. For example, it's perfect as a centralized, easy-to-manage tool for logging in a highly distributed system.

If you're like most Web shops, setting up an Apache Web server is second nature. You can probably do it in your sleep: Deploy the RPM, tweak a couple config files, and boom, you're in business. Given how easy it is to set one up, and how widespread the whole HTTP thing has become, it always amazes me that people use Apache only to serve up Web pages. I think there are lots of other uses for this wonderful little tool, and one of those uses is for logging.

Suppose, for example, that you have a distributed system that involves a number of moving parts. You've got a Perl script that harvests some RSS feed, maybe a Java app that inserts the output into a database, and finally some cron that republishes your site with the new feed. Each piece is written in a different language, and maybe they run on different machines, so how can you make sure that everything ran the way it was supposed to? Well, since each individual piece probably does its own logging, you could check the three individual logs. Or you could have them all hit a common Apache server. Then, you can just grep through the Apache log files to see that everything has run as planned.

Setting it all up
Implementing something like this is pretty straightforward. First, you need to configure your Apache logging server to log the information you need. Your httpd.conf will require two directives:
LogFormat "%t \"%r\" %>s  \"%{Referer}i\" %b" myAppFormat
CustomLog logs/myapp_log myAppFormat

Apache can generate multiple logs, with different log formats. Depending on how you installed Apache, there might already be a LogFormat called common that logs into a file called access_log. You can either modify that one or, if you need to keep the access_log separate, just define a new LogFormat and use that in your CustomLog directive. That's how I've done it in the example above, creating a new LogFormat called myAppFormat to generate a log file called myapp_log.

What it all means
This LogFormat is pretty stripped down, but it has everything I want for this particular log. The %t spits out the time, and %r will give me the first line of the request. Notice that I put escaped quotes around that directive, which will make it easier when you have to parse through the log file. The %>s gives me the final status (in case there were any intermediate redirects—all I'm interested in is the status code from the last item Apache actually executed). The %{Referer}i gives the value of the Referer variable that came in from the http request. And the %b provides the number of bytes served up by Apache. Output from this format might look something like this:
[03/Oct/2003:09:11:29 -0700] "GET /index.html" 200 "-" 13693

Note that there's a dash where the referrer should be. That's just because in this particular instance, my browser hit the URL directly, so there was no referrer. If you're using this technique to log from stand-alone apps instead of a browser, that's probably what you'll see—your Perl script and your Java app won't send referrer info in their http requests, they'll just grab for a particular page directly from Apache.

As for that CustomLog directive, the path is relative to your ServerRoot unless there's an initial slash in front of that path. So if you need to specify a particular directory instead of a relative directory, you can easily do that as well.

Reading the results
You'll probably want to make sure that you have actual pages set up in your Apache docroot, even if they're just little dummy text files. Since we're only using this for logging purposes, it's not absolutely necessary that the requested files actually exist. But if they do, it means the status code will be that nice comforting 200 instead of the nasty old 404.

You might have your Perl script go for a file called perl_harvester.html, and the Java app might try to grab a file called java_db_inserter.html, and so on. That way, the log file will be human readable, so you'll be able to look and see whether all the apps ran on a particular day by checking for their signature GET line in the log file.

Even with this stripped-down format, the log file itself is going to get rather long. You should probably use Apache's piped output capability to send the CustomLog into a rotation program of some sort to keep the log files at a manageable size. I'll save that topic for my next article.

Editor's Picks