Web server logs contain a lot of information regarding site usage, which can be handy when you're investigating application problems. In this article, I examine Web server logging and its contents.
Standards exist for Web server logging; these standards apply to what is contained within the log files. It's important to know that a Web server may have one or more log files. For example, the Apache platform has an access log, as well as referrer and agent logs.
- The access log is the most important file when you're curious about the who and why of site visitors. An entry is made in the access log every time someone requests a file from your Web site regardless of whether the access attempt is successful.
- The referrer log contains data on where a client was before coming to the site.
- The agent log tells you the name and version of the browser that requested a file on your server.
Who asked for what?
Developers are often interested in what was requested and the status of the request when debugging page problems. With that said, the access log is the most important file as it identifies all requests and the status of the requests. The access log has two formats: common and extended. The common log format contains the following data columns:
- The first column identifies the host computer (IP address) requesting the Web server resource. The value in this field is either the fully qualified domain name or the remote host.
- The second column identifies users by their username per RFC 931. It is rarely used, so it is common to see a hyphen (-) in its place.
- The third column is the user authentication field.
- The fourth column contains the timestamp of the request. The format for the timestamp field is: DD/MM/YYYY:HH:MM:SS OFFSET.
- The fifth column is the HTTP request, which has information on: the method (Post, Get, etc.) the remote client used to request the information; the file the remote client requested; and the HTTP version the client used to retrieve the file.
- The sixth column identifies the status of the request. Using this value, you can easily determine whether a resource was correctly transferred, not found, and more.
- The seventh column indicates the number of bytes transferred to the client as a result of the request. If a status code other than a success code (200) is used in the sixth column, this field will contain a hyphen (-) or a zero to indicate that no data was transferred.
- The final column contains the user agent or browser used by the client.
The extended format adds two more columns with the referrer address of the page accessed before the resource request in the eighth column.
These standards are supported by Apache and most other Web platforms. If you are using Microsoft IIS, the common and extended formats are available along with custom options. IIS does default to the extended format.
Putting it to use
You can use the access log data to debug application requests. One personal example is a recent stint with a client who had two Web applications on two different Web servers in two cities. Each application interfaced with the other via ASP.NET Web services to exchange data. There seemed to be a communication problem between the applications as certain operations were not occurring as planned.
I began to debug the application with a glance at the Windows event log on each server to locate any possible application errors that kept them from functioning. The next step was a survey of the Web server log on each side to investigate whether the requests were actually received by the server and how they were processed (if they were received). Listing A contains an excerpt from one of the logs.
The sample is from a Windows 2003 Server running IIS using the extended log format. The first line shows it was requested on September 1, 2006 from the 192.168.1.100 address using a post request for the specific resource. The status returned for each request is 500, and the final column displays the user agent used for the request. In order to properly review the request, you need to have a basic knowledge of the status codes.
There are five basic classes of status codes:
- 1XX: continue or protocol change
- 2XX: success
- 3XX: redirection
- 4XX: client error/failure
- 5XX: server error
Each class contains its own set of error codes. The following list provides a sampling of these status codes:
- 100: continue
- 101: switching protocols
- 200: file transfer OK
- 201: created
- 202: accepted
- 301: moved permanently
- 400: invalid request
- 401: client not authorized to access file
- 403: client forbidden from accessing file or directory
- 404: file not found
- 405: method not allowed
- 408: request time-out
- 415: unsupported media type
- 500: internal server error
- 501: not implemented
- 503: service unavailable
The 500 error is a common status code that is returned when there are problems with the application or Web server. The problem with the status code is that it provides very little additional information; however, it does let you know that the request was actually made, so you can quickly rule out any network or connectivity problems and turn your attention to the actual application.
Tracking down and debugging Web application issues is often a tedious process. Depending on the problems, there are many areas of the application to investigate. One area that is often overlooked by developers is the Web server log. While it provides a wealth of information on Web server requests and users, it can also be used to investigate Web application issues such as the requests being made and the status of the requests.
Miss a column?
Check out the Web Development Zone archive, and catch up on the most recent editions of Tony Patton's column.
Tony Patton began his professional career as an application developer earning Java, VB, Lotus, and XML certifications to bolster his knowledge.