Maximize Apache Web site analysis by leveraging Apache's session-tracking capabilities.
The Web is what it is today because it's built upon a simple yet powerful protocol, HTTP. Innovation in Web technology is driven as much by what the protocol can’t do as by what it can, and the problem of session state—almost never an issue in the worlds of LAN and WAN—has stimulated some interesting and useful technology for recording what goes on between user and Web site.
Apache is a foundation for HTTP implementation, and these innovations spring to life on an Apache server. Central to the session state issue is the tracking of client sessions, in itself a dual-use feature: With session tracking techniques, not only can you control the client session but enhance the user’s experience as they interact with the server. Put simply, session tracking actively applied in Web site planning can make a site more dynamically interactive for users.
While this article provides an introduction to the two methods available for session tracking in Apache server, be sure to look for the follow-up articles that will delve into implementation and security.
The state you’re in
The concept is simple: While HTTP is a stateless protocol, your Apache software is able to circumvent this weakness by capturing state information at every incoming request and outgoing response, specifically capturing client identity and storing them all. This data is overwhelmingly useful in analyzing Web site usage and performance, and there are commercial analysis packages readily available for this purpose. But you can go further.
If you were to gather all the requests/responses in the day of a Web server’s life, sort them by client ID, then sort them chronologically within client ID, any given batch of requests by a specific client would represent a session, from login to logout. If you’ve captured this information and collated it in this way, then you have tracked sessions at your fingertips.
Taking it further, you can know specifically the nature of individual client requests; that is, you can know what Web page they are on, what screen element they are clicking on, and so forth. You can effectively reconstruct their activity on the Web site from the captured requests. Armed with this information, you can do the two important things mentioned above: analyze site usage and tailor the site’s responses to the individual user on subsequent visits (such as when you visit your favorite CD/DVD site on the Web and it presents you with preferential suggestions based on your past purchases).
There are two ways to implement session tracking on an Apache server: the URL-rewrite method and the cookie method.
The URL-rewrite method
This method is useful when you don’t know whether or not your clients have their browsers enabled for cookie implementation (the basis of the other method). It has near-universal support and is very simple to do.
It works like this: When a client initiates a Web session, the session ID is henceforth embedded in every HTTP request. In this way, successive requests serve as carriers of session information. It's a simple matter to track down all requests tagged with any particular session ID and thereby recreate the dynamics of the session.
The good news is that this style of session tracking works with just about everyone. The bad news is that constantly regenerating the URL makes it impossible for a user to remember, and bookmarking is useless. Another downside is that the office hacker could walk past a client’s computer, write down the URL (which is there for all to see, with a clearly visible session ID), and go elsewhere and hijack the client’s session.
The cookie method
A "cookie" is a small text file containing session state information that is passed from server to client and stored either in the client machine’s memory (a per-session cookie) or its hard drive (a persistent cookie). Cookie data retained on a client machine can serve multiple purposes, being submitted to the originating server again in future sessions with new HTTP requests (and providing the server with a means of recognizing past clients). In this way, a Web site can be adapted to adjust its responses to individual clients based on past activity, as with the URL-rewrite method.
This method of session tracking can be very dynamic. The session ID is embedded in the cookie, so on any particular visit from a frequent user, a previous cookie submitted with HTTP header information is going to advise the server of the current user’s previous session ID. The server can then, in theory, return to that session and extract useful details of that prior visit, and adjust the Web site’s responses in real time!
In addition, a number of steps can be taken to enhance session security via cookies, and your server can store cookie data for administrative use. Cookies have the upside of being user-transparent, but they also have the downside of being far less universal in implementation across browsers.
Session-tracking security issues
Any time you’re gathering data about either your Web site or your users, you’re creating a hacker target and generating new points of vulnerability in your system. Don’t implement session tracking until you’ve considered the risks and created a plan to deal with them.
As mentioned above, a malicious user can eyeball a session ID in a URL on a friendly client computer and use that information to hijack the user’s session. But that’s not the only way. Consider that cookies themselves are clear-text transmissions, and an eavesdropper can easily copy them and snag the session ID.
Also consider that session tracking essentially means keeping a very detailed log of server activity, and those logs are stored somewhere. It's a simple matter to take a server log file and reconstruct the activity of all of its users. This includes the preferences and purchasing patterns of customers, in some cases. Your users may not take kindly to having this private information stolen. To prevent the unauthorized use of these files, you must carefully restrict access permissions on the directories containing these files. I'll cover how to overcome some of these security issues in my follow-up articles.