This blog entry is also available as a TechRepublic download.

For a Web site owner or administrator trying to gauge the popularity and usability of his/her Web site, log file analysis is possibly the most powerful tool available. Server log files, which record details of incoming client requests, are rich sources of information on popular URLs, important referrers, concentrations of user activity, geographic usage patterns, and click-through paths.

With this in mind, it’s no surprise that Apache offers administrators tremendous flexibility in customizing its log files. This How do I… tutorial discusses the various configuration directives relevant to this logging mechanism and offers some ideas for customizing the log output for greater utility.

Understanding the default log format

By default, Apache logs each incoming request to its access log, typically located within its logs/ directory. If you open up this log in a text editor, you should see something like this: - - [14/Nov/2005:22:28:57 +0000] "GET / HTTP/1.0" 200 16440 - - [15/Nov/2005:22:45:56 +0000] "GET / HTTP/1.1" 200 36821 - - [15/Nov/2005:22:45:56 +0000] "GET /index.php?=PHPE9568F35-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 200 2146 - - [15/Nov/2005:22:45:56 +0000] "GET /index.php?=PHPE9568F34-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 200 4644

Each line in this file represents an incoming HTTP request, and Apache records information about it using a format known as the Common Log Format (CLF). Reading from left to right, this format contains the following information about the request:

  • the source IP address
  • the client’s identity
  • the remote user name (if using HTTP authentication)
  • the date, time, and time zone of the request
  • the actual content of the request
  • the server’s response code to the request
  • the size of the data block returned to the client, in bytes

These various fields are separated with spaces, with the date/time field enclosed in square braces.

This CLF-style is fairly standard among Web servers, and most log analysis tools are able to parse it without any special customization.

You can alter the default logging format to store more information about each request by altering the LogFormat directive in the Apache configuration file. This directive accepts two values: a string consisting of format modifiers, each one representing a particular piece of data about the incoming request, and a human-readable label for the string. Thus, for example, a LogFormat directive for the CLF would look like this:

LogFormat "%h %l %u %t \"%r\" %>s %b" common

You can obtain a complete list of the format modifiers supported by the LogFormat directive from the Apache online manual.

Creating a custom log format

If the default log format, as described above, is not appropriate for your needs, it’s a simple three-step process to modify it:

Step 1: Define a new log format and assign it a label via the LogFormat directive

Let’s assume for a second that you want only the date, time, protocol, and URL requested. Looking at the list of format modifiers, it’s clear that this information is embedded in the symbols %H, %m, %t, and %U, respectively. Use this information to create a custom log string and assign it the label simple, as below:

LogFormat "%H %m %t %U" simple

Step 2: Tell Apache to use the new format by referencing it in a CustomLog directive

Next, modify the existing CustomLog directive to reference your new format string, via its label, as follows:

CustomLog logs/access.log simple

Step 3: Restart Apache

Once the configuration changes have been saved, restart the Web server for the changes to take effect.

shell> /usr/local/apache/bin/apachectl restart

And now, if you inspect your logs after letting Apache serve a few incoming requests, you’ll see a significantly simpler record of what’s been happening:

HTTP/1.1 GET [12/Oct/2006:16:49:06 +0530] /index.php

HTTP/1.1 GET [12/Oct/2006:16:49:07 +0530] /index.php

HTTP/1.1 GET [12/Oct/2006:16:49:07 +0530] /favicon.ico

HTTP/1.1 GET [12/Oct/2006:16:49:15 +0530] /dev/csstext3.html

Some more examples

If you’d like to log the referrer and browser making the request, you can use a format like this:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" detailed

CustomLog logs/access.log detailed

If you’d like to compare the time taken to serve the request vis-à-vis the request size, consider the following:

LogFormat "%U %b %T" timediff

CustomLog logs/access.log timediff

How about recording the local and remote IP addresses, the size of the response, the first line of the request, the protocol, and the timestamp, all separated by pipes?

LogFormat "%a | %h | %P | %r | %t | %U | %q | %H" complex

CustomLog logs/access.log complex

Using more than one log file

If you don’t like the thought of all this information being crammed into a single file, you can even separate your logs, sending some data to one file and the remaining to another. This is fairly easy to do: Define multiple LogFormat(s) and then enter a separate CustomLog directive (with a different file location) for each one. Listing A is an example; it sends detailed request information to the main log file, referrer information to a second log file, and client IP addresses to a third one:

Listing A

LogFormat "%h %l %u %t \"%r\" %>s %b" detailed

LogFormat "%h" ip

LogFormat "%U \"%{Referer}i\"" referer

CustomLog logs/access.log detailed

CustomLog logs/ip.log ip

CustomLog logs/referer.log referer

Remember to give the Web server write permission to the logs/ directory or else none of your logs will be created.

Log analysis tools

Once you’ve got your log files set up the way you want them, you’re probably going to want to analyze the data that’s gradually collecting within them. To conclude, here’s a list of some of the more popular log analysis tools available online, with URLs: