When most administrators think of log analyzers for Web servers, they typically view them as being useful for monitoring Web traffic. However, these log analyzers can also be useful for performance tuning and security auditing, as well as helping to meet the demand for certain types of content on the Web site.
For Apache—the most popular software for setting up a Web server—there are a number of log file analyzers, both commercial and open source. We're going to look at some of the open source solutions that can be used when running Apache on Linux.
Format of log files
All of the log analyzers we'll be looking at have one thing in common: They all use Apache's log file as their source material. From this data, they pull information on many different things, such as source IP addresses, timestamps, and resources being accessed. As long as the output is in a readable format, the analyzers can work their magic and put the results in a graphical form. Apache typically logs data in either the Common Log Format or the NCSA's Combined Log Format. For more information on the log formats, see this link from the Apache Web site.
The first log analyzer we'll look at is Analog. This one is billed as "The most popular logfile analyzer in the world." It has a number of detailed reports on successful requests, failed requests, and data transferred to name a few. There are quite a few pre-compiled binaries and source code distributions available here. It was also easily installed via Debian's APT system.
After installing Analog, your next step is to modify the /etc/analog.cfg configuration file. Here you will set the location of the log files that Analog will be interpreting. For example, you may configure a LOGFILE entry to point to /var/log/apache/access.log. Multiple files can be specified. You can also modify the hostname that Analog will display here.
In order to get the data into something that you can actually use, you will have to send Analog's output to a particular file. You can also specify what format and what language you would like this in. The formats include HTML, PLAIN, and COMPUTER. The first two are fairly self-explanatory. The COMPUTER format allows you to specify a separator, such as a comma for CSV formatting. This would allow importing into a wide variety of spreadsheets and databases.
This is configured with the OUTPUT variable, with HTML being default. As far as languages go, Analog supports 32 languages including English, French, and German.
Unless you want to redirect Analog's output each time it's run, I'd suggest adding an OUTFILE variable to /etc/analog.cfg. For instance, something like:OUTFILE /var/www/analog/hostname.%M%D%y.html.
By doing this, you can specify a path and even use time codes to allow filenames to be created with the current day, month, and year. With this capability, you can have Analog run multiple times a day (using cron), and it will update the current file or create a new one based on the system clock. This helps automate the process of statistics gathering and eases usability.
By default, Analog seeks to write to standard out (STDOUT) and has no OUTFILE setting. The IMAGEDIR is another non-default option that is very useful. When I installed Analog, the default image directory was /usr/share/doc/analog/images. That's fine, but I preferred to copy that directory to /var/www/analog/images/ and updated /etc/analog.conf accordingly.
Once you've created your configuration file and have begun running Analog, you will start to see the data roll in. There are sections for a general overview as well as monthly, daily, and hourly summaries. File reports, types of operating systems viewing your pages, and request data are others. Historical data is maintained so you will be able to compare today with yesterday or last month. Figure A shows a brief look at one of the Analog pages.
|Analog doesn't have the best graphics, but its information is well organized and easy to find.|
Analog can also be used with Report Magic, if the standard graphics are not pleasing enough to the eye. All in all, Analog is a robust package and shows plenty of information about the Apache logs.
AWStats is another popular open-source-based log file analyzer. Binaries and source can be obtained here. Unlike Analog or Webalizer (which we'll look at in a minute), AWStats is developed with Perl instead of C. It also boasts the capability to analyze FTP and mail logs. The AWStats Web site also makes note of its ability to report more granular data such as screen sizes, Java support, and Web compression statistics.
The file awstats.pl is the heart of the package and can be run from the command line or as a CGI script. The file /etc/awstats/awstats.conf contains all the package's configuration settings. Here you will need to set the directory where your Apache logs can be found as well as the directory housing the icons used in the AWStats Web interface. By just entering those two settings, you can kick off AWStats and get some nice data on your Web server.
You also need to make sure awstats.pl is in an accessible CGI-BIN directory (for example, /usr/lib/cgi-bin/). This should correspond to the entry in Apache's httpd.conf. Once installed, AWStats can be run by accessing awstats.pl through a URL on your Web server, or on whatever host your log files are stored on. AWStats lists data by month/day/hour and also attempts to show what countries requests are from, what robots or spiders accessed your host, and what, if any, HTTP errors were found. This data is static and is based on the last time AWStats ran. You can also set the following:AllowToUpdateStatsFromBrowser=1
This option creates an "Update now" button that allows you to quickly regenerate the reports via the browser without waiting for the next automated run.
The default GUI for AWStats is a little more graphical and somewhat more detailed than Analog. Figure B provides a look at part of an AWStats page.
|AWStats provides some slick graphics and bar graphs.|
AWStats also allows for a number of appearance-altering settings. You can modify the default images, tell it to use frames, or even use a cascading style sheet (CSS). This is great for administrators who want to control the presentation of data.
In general, AWStats was easy to install and had a much more commercial feel to it than Analog. It also offers the possibility of exporting its data in XML format.
The final Apache log analyzer we're going to look at is Webalizer. The configuration file is located at /etc/webalizer.conf. By default, Webalizer will install under your Apache root directory (i.e. /var/www/webalizer).
After a quick check to make sure the path to your logs is correct, you can run the webalizer command at the prompt and then browse to the Webalizer URL on your server. You will get a summary page with your host's access statistics broken down by month.
You can drill down into a particular month and get information on page hits, kilobytes transferred, etc. The data provided is more or less identical to that of the previous two analyzers; its method of displaying it is really the only difference. The actual files accessed and their URLs are presented, broken up by bar and pie graphs. The format is easy to read and can also be modified to a certain degree through the configuration file. This includes insertion of HTML code and the ability to modify the default color scheme. Figure C shows the top of the default Webalizer page.
|With Webalizer, you get a plethora of information displayed on one long page.|
Webalizer also provides an easy means of ignoring specific entries. If you didn't want to include internal HTTP requests, for instance, you could add a line like the following to webalizer.conf:IgnoreURL /Intranet*
Webalizer will also need to be configured to run on a periodic basis, either through the cron daemon or another means of automation. The historical data and graphs will be modified at that time and can still be configured to be viewed through the standard Web location.
Webalizer can certainly generate enough to satisfy most administrators. It offers a pleasing graphical display, similar to AWStats, but it's not quite as easy to install and configure as AWStats.
The three Apache log analyzers discussed, Analog, AWStats, and Webalizer, while all similar, each have their own techniques and features. The purpose of all of them is to provide data gleaned from Apache's logs that can help identify problems and track popular files or URLs.
While accessing the Apache log files directly is always an option, it is hard to get a general overview from the raw data. Log analyzers help put this information into an easy-to-understand format and allow for easier tracking of historical data.
For more information on the similarities and differences between these log analyzers, take a look at this comparison.