An often-overlooked facet of Web development is the great stuff going on beneath the hood—the underlying protocols that permit communication between the client browser and the Web server. It's worth learning about what's going on there because you can harness an amazing amount of useful data. For example, you can create adaptive Web applications that degrade gracefully based on the user's browser capabilities, and you can change header information to deliver customized pages to a variety of platforms, including wireless phones and PDAs. This article shows you how to programmatically retrieve and use header information.
The HTTP protocol
The architecture of most Web communications is based on the HTTP protocol, which the W3C developed as its popularity increased. The best way to examine the HTTP protocol in action is by using a packet monitor. Packet monitors (otherwise known as packet sniffers or network monitors) allow you to intercept the data packets that are regularly transmitted between your PC and remote Internet servers.
If you have Windows 2000 or Windows XP, I would advise using freeware such as AnalogX PacketMon. Windows NT has a built-in Network Monitor that is part of the Systems Management Server (SMS). For other Windows users, WinDump will do the trick. (One small word of caution: This freeware has no GUI; it's all command line. Refer to the Web site for the instructions and command switches.) If you're a Mac user, EtherPeek shareware by WildPackets is your best bet. For UNIX users, tcpdump is the standard for capturing data packets.
Making sense of tcpdump with add-on enhancements
Tcpdump is a useful tool for tracking down network performance issues, but the output can be difficult to decipher. This article discusses add-on products that make the job easier.
A raw data packet looks something like this sidebar. Basically, the packet is sent in a stream of hexadecimal values representing plain text. It looks a bit daunting—but no worries. We will concentrate primarily on making sense of the readable text on the right.
Request and response
The HTTP protocol is based on request and response. To examine the dialogue between server and client, I decided to visit the popular Google search engine while my packet monitor was running. The following information was initially sent to Google's servers in packet form:
GET / HTTP/1.1..
Accept-Encoding: gzip, deflate.
The request is formatted with the GET method, followed by some metainformation about the browser's MIME, user agent, language, and encoding capabilities. The first header requests a communication with Google’s servers using the HTTP version 1.1 protocol. The Accept: */* header tells Google’s Web servers that my browser is able to accept all MIME types. MIME stands for Multipurpose Internet Mail Extension. It is used to identify file types (such as images, documents, and applications) and helps your browser and operating system cope with them. The Accept-Language request header indicates that the browser is set to American English.
The Accept-Encoding header tells Google's servers that document compression can by used on the incoming file using GNU-ZIP (gzip), a UNIX-based compression scheme. Finally, the user agent identifies it as a Mozilla-compatible browser (in this case, Internet Explorer). Here is the response header I received from Google:
HTTP/1.1 200 OK.
Date: Tue, 15 Dec 2002 00:18:45 GMT.
The first header acknowledges the HTTP protocol and version. The server also sends response code 200 to let our browser know that the request was received and everything is okay. There are many possible combinations of server codes to deal with all sorts of circumstances, including the all-too-familiar "404—Page Not Found" code. The HTTP specifications explain all the possible combinations of server codes.
The Date header gives us the system date on Google's server. The Cache-Control header indicates that the response is meant for my browser's cache and shouldn't be placed in a shared cache. This makes a great deal of sense because Google wouldn't want gateway servers and proxies to keep cached copies of Google result pages containing every search word under the sun. The Content-Type header prepares my browser to receive the HTML page by identifying the incoming data as a text/htm MIME type. The HTML code then loads into the browser.
The opening page contains a GIF graphic representing the Google logo. Since the GIF image has a different MIME (image/gif) from the HTML page, the browser has to make a separate request to retrieve it. Here is the information sent to Google's server to obtain the image.
Retrieving header information
The Navigator object gives you request header details such as the browser name and version and the platform. For example, the Navigator.userAgent property will provide you with the value of the User-Agent request header found in the client packets.
The program simply looks at the request header and then prints out a personalized greeting in the user's native language based on the Accept-Language settings and custom objects within the client browser. If the client browser is Japanese, the user will get a Japanese greeting.
Header information can also be accessed using server-side scripts. Active Server Pages (ASP) allow you to access request and response headers using the Request.ServerVariables collection.
Referencing them using the methods specific to each server language allows you to access most of the headers outlined in the HTTP protocol specifications. In the case of ASP, you can obtain the client browser's User-Agent using the following code:
browserUserAgent = Request.ServerVariables("HTTP_USER_AGENT")
You can easily create a program that examines whether users have a text-mode Web browser and then send them to the appropriate Web page on your site. Listing B shows an example of this technique.
By understanding how the headers work, you can create more interesting and effective Web applications. If you want to learn more, be sure to check out the Web standards, protocols, and specifications at the W3C.