I recently wrote about using Fiddler to examine HTTP traffic for debugging purposes. To better use the information, it helps to know a bit about the guts of the HTTP specification. In this column, I'll focus on the common items in an HTTP request, so you'll know what the information means when you see it in a debugger.
The beginning of the HTTP request will have the request line, which will be followed by up to three headers: a general header, a request header, and an entity header. After that will be the message body. The request line specifies the method type (such as GET or POST), the URI requested, and the version of HTTP to use (the current standard is 1.1). Here is the full list of defined method types:
The URI is not necessarily the full URI (an absolute URI); for example, you can have the host specified in headers and use a relative URI. However, the specification says you should only use the absolute URI when communicating with a proxy server, and that normally you use the absolute path followed by a host header.
The general header has basic information about the connection, such as cache details, the date, and how the data are encoded during transfer. The request header is specific to what a request needs; it defines things such as what languages, encodings, and character sets the client is willing to accept. The request header also specifies the hostname being accessed (this is very important for virtual Web servers), the range (for requests that are getting partial bytes for things like download resumption), the referrer (what URL lead you here), and the user agent. If you are writing code that accesses HTTP directly, I suggest that you always populate the user agent; I find that a number of servers and services reject requests without a user agent even if it is one you make up. There are other fields in the request header as well, but these are the most important. The message headers are a set of name/value pairs.
The entity headers provide information about the entity body (which is really just the message body after being decoded) that the request will be providing (such as POST data or a file that is being uploaded). This data includes the content length in bytes (which is critical and must be accurate) and the content type (which is also important and must be accurate). If there is no entity following the headers (like in a GET request), these entity headers are not needed. You can also make up your own entity headers, as long as the receiving server is aware of what to do with them.
After all of this is the message body, which is the encoded content of the entity. So if you are using Base64 encoding on a file you are uploading with PUT, the message body is that file after being encoded.
There is not much more to HTTP requests than what I've described. HTTP requests are not nearly as mysterious as they appear at first, and the specification is pretty easy to understand. Armed with this knowledge, your HTTP debugging sessions should be a bit smoother.
In next week's column, I'll examine the data that comes back as the HTTP response.
Justin James is the Lead Architect for Conigent.