Data Centers

Gather Web server stats with a custom PHP app

You can't always trust the server statistics provided by a third-party hosting company, so developing your own monitoring application is a wise move. See how to use PHP to create a stats-generating app that produces reliable logs.

Most Web hosting companies provide access to Web site statistics, but these server-generated stats often don't give you the complete picture. For example, a misconfigured Web server will not recognize certain file types, and unrecognized files will not appear in the stats. Luckily, you can get the information you need by using PHP to create a custom stats-gathering application.

The structure of the Common Logfile Format (CLF)
The NCSA originally designed the CLF for the HTTPd. The CERN HTTPd is a public domain Web server maintained by the World Wide Web Consortium (W3C). The log file specifications are listed on the W3C Web site. Both Microsoft- and UNIX-based Web servers can generate CLF access logs. A CLF entry has the following format:
Host IdentAuthuserTime_Stamp "request" Status_codeFile_size

A typical CLF entry looks like this: - - [22/Apr/2002:22:19:12 -0500] "GET /cnet.gif HTTP/1.0" 200 8237

Here is a breakdown of each part of the log entry:
  • Host is the IP address or DNS name of the Web site visitor; in the above example, it's
  • Ident is the remote identity of the person (RFC 931). A dash indicates "none specified."
  • Authuser is the user ID (if the Web server authenticated the Web site visitor).
  • Time_Stamp is the time returned by the server in the format Day/Month/Year:Hour:Minutes:Seconds—Timezone.
  • Request is the type of visitor HTTP request—for example, GET or POST.
  • Status_Code is the status code returned by the server; for example, "200 OK – Successful Browser Request."
  • File_Size is the size of the requested file. In the example, it would be 8,237 bytes.

Server status codes
You'll find the server status specifications developed by the W3C in the HTTP standards. These server-generated status codes indicate the success or failures of a data transfer between browser and server. These codes are usually relayed back to the browser (such as the infamous 404 error, "Page Not Found") or added to the server logs.

Compiling the user data
The first step in creating our custom application is obtaining user data. Each time a user selects a resource on our Web site, we want a log entry to be created. Fortunately, server variables allow us to query the user's browser and obtain the data.

Server variables represent information passed from the browser to the server in the packet headers. REMOTE_ADDR is an example of a server variable. This variable returns the user's IP address:
Example Output:

The following PHP code will display the value of the current user's IP address:
<?php echo $_SERVER['REMOTE_ADDR']; ?>

Let's look at the source code of our PHP application. First, we define the Web site resource we want to track and specify the file size:
// Get the name of the file we are logging

You don't have to place these values in static variables. If you have many items to track, you may want to keep them in an array or a database. If that's the case, you might want to reference each item with an external link, something that looks like this:
<a href="weblogger.php?bannerid=123"><imgsrc="cnet-banner.gif" border="0"></a>

where "123" represents the record specific to "cnet-banner.gif." Next, we query the client's browser via server variables. This gives us the data we need in order to add an entry in our log file:
// Get Web site visitor's CLF information
$timeStamp=date("d/M/Y:H:i:s O");

Then, we check to make sure that the server isn't returning any null values. According to the CLF specification, nulls should be replaced with dashes. So the next chunk of code looks for null values and replaces each one with a dash:
// Add dashes to empty variables (as per the specifications)
if ($host==""){ $host="-"; }
if ($ident==""){ $ident="-"; }
if ($auth==""){ $auth="-"; }
if ($reqType==""){ $reqType="-"; }
if ($servProtocol==""){ $servProtocol="-"; }

Once we have obtained the necessary information, the values are combined into a format that is compatible with the CLF specification:
// Create CLF formatted string
$clfString=$host." ".$ident." ".$auth." [".$timeStamp."] \"".$reqType." /".$fileName." ".$servProtocol."\" ".$statusCode." ".$fileSize."\r\n";

Creating a custom log file
The formatted data is now ready to be dumped into our custom log file. First, we create a file naming convention and devise a method to allow new log files to be generated on a daily basis. In the example , each file has a "weblog-" prefix, followed by the date expressed as month/day/year and a .log extension. The .log extension is commonly used to identify server logs. (In fact, most log analyzers look for that particular extension.)
// Get current logfile name using today's date

Now we want to determine whether the current log exists. If so, we append an entry in the existing weblog. If not, the application creates a new log. (The creation of a new log would most likely happen when the date changes, thus changing the filename.)
// Check to see whether log file exists
if (file_exists($logFile)){
// If YES, Open Existing Log File
$fileWrite = fopen($logFile,"a");}
else {
// If NO, Create a New Log File
$fileWrite = fopen($logFile,"w"); }

If you receive "Permission Denied" errors during a write or append, change the file permission on the target log folder to allow write operations. The default permissions on most Web servers are read and execute. You can change the file/folder permissions using a CHMOD command or using an FTP client.

Next, we create a file lock to allow only one write operation at a time if two or more users simultaneously query the file:
// Create a "Write" File Lock
flock($fileWrite, LOCK_SH);

Finally, we write the contents of our entry:
// Write CLF entry
// Relinquish File Lock
flock($fileWrite, LOCK_UN);
// Close Log File

Processing the log data
Once the system is in production, clients will want a detailed statistical breakdown of the collected visitor data. Since all custom log files are built using a standard format, any log analyzer will do the job. A log analyzer is a tool that parses large log files to generate pie charts, bar charts, and other statistical graphics. Log analyzers are also used to compile data to provide an overview of the users who have visited your site, the number of unique hits, and so on.

Here are a few of the more popular log analysis programs available on the Web:
  • WebTrends is an excellent log analyzer that's useful for large-scale Web sites and enterprise-level networks.
  • Analog is a popular freeware log analyzer.
  • Webalizer is a free analysis program. It produces HTML reports that can be viewed in most Web browsers.

Stick to the standards
You can easily expand the application to generate other kinds of logs. This would allow you to capture more data such as the browser type and referrer. The important lesson here is that using standards or conventions in your programming saves time and simplifies the job in the long run.

Editor's Picks

Free Newsletters, In your Inbox