Big Data

Handle log file analysis with AWStats

Web developers often aren't given the budget necessary to install requisite site monitoring tools. That's why the freeware AWStats log analysis tool is so valuable. This article introduces AWStats and examines its pros and cons.


Most Web developers understand the inherent value of log file analysis, but getting your client to pay for a "maintenance tool" is often an uphill battle. Enter AWStats, an open source log file analysis tool. It's distributed as freeware, but it still packs the performance punch necessary to satisfy most developers' security paranoia and metrics obsessions.

Open source log file tools
Choosing an open source log file analysis tool requires a strict adherence to specific product standards. The tool I was looking for had to be open source (GNU GPL), and it had to be something that was well supported and regularly maintained. In the open source world, that's usually defined as a high percentile of activity on the SourceForge development page. And that's where I found AWStats.

AWStats is an acronym for Advanced Web Statistics, and it's released as freeware under the GNU GPL license. It's available as a .zip, tar ball, and RPM. Version 1.0 was released in May 2000, and the latest stable version as of this writing is AWStats 5.5, released in late May 2003. [Editor's note: AWStats 5.6 was released in late June 2003.]

A number of open source log file analysis tools are available, and the AWStats developers have done a great job of comparing the most popular ones on their development site. This lends a certain degree of product confidence and honesty, which many developers (and their clients) are looking for.

Basic features
As you'd expect with a log file analysis tool, AWStats analyzes all the data that's provided in your log file.But until I started to use AWStats, I didn't really understand the difference between log file formats. Every log file analyzer relies on the data that your log file captures, so to get the most out of your log file analysis tool, you must make sure that you're logging the information you want analyzed. For instance, the Apache Common Log format (which was the default setting with the Red Hat RPM I installed for my Web server) doesn't capture nearly the same amount of information as the Combined Log format.

AWStats works with multiple log file formats, including W3C and Apache log files, and most proxy servers, streaming servers, ftp servers, and mail logs. AWStats works both at the command line, which to be honest is a very painful way of looking at your data, and as a browser-based interface. The browser display can be set up to be framed or frameless.

Installation
Installing AWStats is extremely simple. It's a Perl program that runs in your site's cgi-bin. Essentially, the guts of what you need to configure is done in the AWStats.conf file. The supplied documentation is pretty good, as far as open source documentation goes. But as with most open source software I've used, you really need to go line by line through the configuration file to tweak the program to your needs.

The real power of this application lies in its customizability. Updating the statistics can be automated on the server as a regular process, but I found that from a resource point of view, it was easy enough just to refresh the stats through the browser interface. This poses less regular drain on system resources and helps guarantee that you'll capture the most recent performance numbers.

Reports generated
In a full log analysis, AWStats will show you several statistics, including:
  • Number of visits and number of unique visitors
  • Visits duration and last visits
  • Authenticated users and last authenticated visits
  • Days of week and rush hours (pages, hits, KB for each hour and day of the week)
  • Domains/countries of hosts visitors (pages, hits, KB, 266 domains/countries detected)
  • Hosts list, last visits, and unresolved IP addresses list
  • Most viewed, entry, and exit pages
  • File type
  • Web compression statistics (for mod_gzip or mod_deflate)
  • Browsers used (pages, hits, KB for each browser and each version)
  • OS used (pages, hits, KB for each OS, 31 OS detected)
  • Visits of robots (307 robots detected)
  • Search engines, keyphrases, and keywords used to find your site
  • HTTP errors (Page Not Found with last referrer)
  • Screen size

(Source: AWStats project page)

Advanced features
AWStats boasts a number of advanced features that make it extremely useful. The extensibility of the configuration file allows the user to get close to the data. As a Web developer, I host sites for customers but under our own brand name, so it's always important to try and maintain that brand identity. AWStats allows you to insert your own logo and adjust the color scheme to get the interface looking like an extension of your own brand.

Multiple hosts are easily handled with a few documented flags that need to be turned on inside the configuration file. This program also easily handles sites that exist on a load balancer. AWStats has a robust plug-in feature that allows you to add new functionality, including plug-ins to seamlessly integrate AWStats with two popular open source content management systems: PHPNuke and Typo3.

Negatives
AWStats offers several customizable features that require immediate attention upon installation, or erratic results may follow. It's not as simple as just downloading and running. For example, it took me a bit of time to actually get multihost setup behaving properly. The search engine robot recognition also required some work. In the first week or so that we started running version 5.5, my logs noted several visits from crawler11.googlebot.com that didn't show up as Robot/Spider visits.

One of the key features missing from AWStats that's often found in its commercial counterparts (such as WebTrends) is the ability to format and print a complete report document. The myth of the paperless office aside, it's often been a great asset to print out the full report to take to a Web site strategy meeting. The numbers don't lie.

Worth the investment
Overall, AWStats is a very usable and customizable log analysis tool. It's more full-featured than most any other open source log file analysis tool currently available. The graphical layout of the browser-exported analysis is clean and usable. As a Web development professional who continues to be tasked with more requirements and less resources, I was happy to add AWStats to my toolkit.

Editor's Picks