Apps

Structure of an HTTP request

HTTP requests are not as mysterious as they may seem. Justin James helps make them more accessible by providing an overview of the common items in an HTTP request.

I recently wrote about using Fiddler to examine HTTP traffic for debugging purposes. To better use the information, it helps to know a bit about the guts of the HTTP specification. In this column, I'll focus on the common items in an HTTP request, so you'll know what the information means when you see it in a debugger.

The beginning of the HTTP request will have the request line, which will be followed by up to three headers: a general header, a request header, and an entity header. After that will be the message body. The request line specifies the method type (such as GET or POST), the URI requested, and the version of HTTP to use (the current standard is 1.1). Here is the full list of defined method types:

  • OPTIONS
  • GET
  • HEAD
  • POST
  • PUT
  • DELETE
  • TRACE
  • CONNECT

The URI is not necessarily the full URI (an absolute URI); for example, you can have the host specified in headers and use a relative URI. However, the specification says you should only use the absolute URI when communicating with a proxy server, and that normally you use the absolute path followed by a host header.

The general header has basic information about the connection, such as cache details, the date, and how the data are encoded during transfer. The request header is specific to what a request needs; it defines things such as what languages, encodings, and character sets the client is willing to accept. The request header also specifies the hostname being accessed (this is very important for virtual Web servers), the range (for requests that are getting partial bytes for things like download resumption), the referrer (what URL lead you here), and the user agent. If you are writing code that accesses HTTP directly, I suggest that you always populate the user agent; I find that a number of servers and services reject requests without a user agent even if it is one you make up. There are other fields in the request header as well, but these are the most important. The message headers are a set of name/value pairs.

The entity headers provide information about the entity body (which is really just the message body after being decoded) that the request will be providing (such as POST data or a file that is being uploaded). This data includes the content length in bytes (which is critical and must be accurate) and the content type (which is also important and must be accurate). If there is no entity following the headers (like in a GET request), these entity headers are not needed. You can also make up your own entity headers, as long as the receiving server is aware of what to do with them.

After all of this is the message body, which is the encoded content of the entity. So if you are using Base64 encoding on a file you are uploading with PUT, the message body is that file after being encoded.

Summary

There is not much more to HTTP requests than what I've described. HTTP requests are not nearly as mysterious as they appear at first, and the specification is pretty easy to understand. Armed with this knowledge, your HTTP debugging sessions should be a bit smoother.

In next week's column, I'll examine the data that comes back as the HTTP response.

J.Ja

About

Justin James is the Lead Architect for Conigent.

25 comments
PhilippeV
PhilippeV

What should have been said is that there may be headers in any one of the three categories (but actually there are other headers, specifically built to work with sessions, cookies, cachability in proxies, security... The article also says incorrectly that the first line of the request is the method (one of those listed verbs in capitals) optionally followed by a entity URL (required for methods that involve entities); actually it MUST be followed by the protocol identifier ("HTTP/1.0" for legacy requests, not recommanded, or "HTTP/1.1" for the current protocol which allows persistant connections, pipelined requests, chunked entities, progressive transfers, and clean management of sessions and cachability in proxies) It also incorrectly says that it should contain the full absolute URL. This is ONLY true if connecting explicitly to a proxy to perform a request to a designated external site in order to query one of its entities. But before that, you have to establish a session with th proxy. The standard URL you put in the query is the absolute URL, without the protocol specifier ("http:" or "https:"), and without the hostname (prefixed by "//"). So the URL MUST start with a "/" (it is invalid to request a relative URL starting by "./" or "../"), and may also contain a query string (starting by "?", typical of GET requests used to submit simple small forms without encoding any content body in the request) This article forgets many details about the REQUIRED support of the chunked format for sending or receiving content bodies in a progressive way (without defining the "Content-Length:" header explicitly, for exampel when querying streamed resources whose result length cannot be determined before full completion of the request) : it is required for HTTP/1.1 compliance. Important request headers that are frequently needed by servers are: "Accept-Language:" : the requester specifies the list of languages in which it prefers to recieve a response. "Accept:" : the requester specifies the list of content-types that it supports (notably if it supports HTML, plain text, image formats, video formats, plugin-specific formats), and the server will select an appropriate response. These request headers will have the effect of changing the actual entity that will be returned by the GET or POST query, so if you don't want to retrieve the entity content, but just see if it has changed or if it exists, using the "HEAD" request, you'll need to supply them as well for the HEAD request (even though the response to HEAD request should not contain any entity). There's no fundamental difference between POST and PUT methods, except the fact that PUT is typically only used by website administrators (or in the WebDAV protocol based on HTTP to support remote filesystems "in the cloud"), in order to CREATE a new entity on the server if it does not exist (so a PUT method will not be returned a status like "404 Page not found" response, but possibly only an authorization error). An important header, that all HTTP/1.1 clients and servers must support is the "Connection:" header used to create and manage persistant sessions (it is used on both sides : by the client in its requests, which may be pipelined over the same HTTP session, and by the server in its response (notably in case of error, to indicate that he has closed the HTTP session prematurely). In HTTP/1.1, sessions are persistant by default, and it's up to the client to terminate the session if he no longer needs to perform another request. As in HTTP/1.1, multiple requests can be pipelined (and not necessarily served in the same order by the server, it is common to also include a request ID in the HTTP request, that the server will include verbatim in its response when sending that response. Without it, the pipelined requests will be replied in the same FIFO order by the server, even if this may cause additional delays in order to prepare the response on the server.

pgit
pgit

Sure, you can 'make up' a user agent, and indeed some servers are (needlessly) anal about having the information, but you don't really want to just make one up, do you? Isn't it advisable to use the most common agent of the moment? (which might be IE6, or maybe IE8 by now) Of course if you're dealing with one URI and you know it's 'druthers, go for it. But if you deal with variable resources you might be better off ordering vanilla. I've seen servers fall over when given a less common user agent spec. Took me forever to figure out but on one occasion after things just didn't want to work I finally (out of frustration) set the agent to IE6 running on vista and presto... we have data. (edited for spelling)

jck
jck

I've dealt with HTTP. It's rather simple. It's the document content that is what gets absurdly complex, especially nowadays.

dgurney
dgurney

How do you set out to talk about the composition of a message and then not show a single portion of one as an example? Instead of showing the three header lines, you just say they exist. And so on with the rest of this pointless verbiage. This is absolutely worthless to anyone who'd need to know about HTTP, and of course worthless to anyone who doesn't. grade: F

Justin James
Justin James

My first foray into HTTP was implementing a substantial amount of the parsing for it in Perl, and I learned a lot. Since then, I've been glad to have learned HTTP's details. Do you need to deal with raw HTTP on a regular basis? J.Ja

Justin James
Justin James

Re: additional headers - Those additional headers you mention are outside of the HTTP spec itself. For example, the word "cookie" does not appear in RFC 2616 at all. However, I should have talked about them anyways, since they are so common. You are right that I omitted the HTTP version number when discussing the request line; it was something I had in my head to mention, but by the time I got to that part of the sentence, I had lost it entirely. My fault completely on that, since I didn't pick it up in my proofreading either. You have misread what I wrote regarding the "absolute URI"; I said very specifically that it's only used when talking to a proxy, but the "absolute path" must be used. I left out details of chunking for space concerns; I've been trying to reduce the length of articles lately, and the gritty details of chunking (which need to be supported, but are not used in the things that you usually see come across a packet capture) were omitted for that reason. The additional details you've provided are appreciated and great additions to the article, thanks! J.Ja

Justin James
Justin James

... some servers decide to do some goofy things if you use IE (or another known browser) as your user agent. :( It's definitely a case of "test, test, test". J.Ja

Justin James
Justin James

I think that's a fair criticism, to be honest. Some examples of a typical request could have been included and it certainly would have helped. To be perfectly honest, since the redesign of the site, I've been very, VERY shy about doing *any* kind of samples, the formatting is awful! For example, if I included a sample depending on where in the article it appears (especially by that inset ad), all sorts of word wrapping occurs, and that word wrapping destroys the accuracy of a sample in this kind of situation. I'm not making excuses for myself here (I should have found a way to show an example), I'm just trying to explain what's been happening. It's also why I am including fewer and fewer code samples in the articles as well. J.Ja

seanferd
seanferd

Dealing with a list of three consecutive elements, you need an example? There is a link to RFC 2616 located in the article text.

Spitfire_Sysop
Spitfire_Sysop

I was expecting to see an example of the header and an explanation of what the different parts are. Somehow, I don't think that the browser sends out a bullet list. Where is the actual structure? Not even a link?

mattohare
mattohare

I might have liked an example or two of requests to see how all of this plays out. :D You're right. It's good to know how something works when programming to or with it. It avoids making some really stupid mistakes. (I'm going through that with some XML I'm doing now. Still a lot I don't know about the whole thing.)

pgit
pgit

I don't have a lot of experience in this area, I guess I've been lucky and haven't suffered any of that goofiness. =|

AnsuGisalas
AnsuGisalas

KISS. Now you just need an app to make that "text editor to image" drag'n'drop nice and easy, right?

pgit
pgit

There's a number of functionalities that went bye-bye with the change of forum software. You used to be able to quickly see what messages you've already read versus the new ones since your last visit. Gone. And I had noticed code examples and illustrations declined in # around the same time period. Now I know why. (pita for you authors) I'm sure management has it's good reasons.

seanferd
seanferd

But that doesn't change the fact that people are inferring whatever they want from the title. You could have been criticized better with the offering of the same suggestion without it being based on the poor, much-abused strawman.

jhoward
jhoward

The purpose of the 'movement' JJ referred to was essentially to get people to stop using non-standard HTML/CSS on pages that only worked on I.E. or looked horribly designed in alternate browsers making the alternate browsers seem like they were the ones that were broken. The other half of this is that pages made with standards compliant code didn't always look right in I.E. Most people do not care about what browser they are using. If you tell them that it is broken and to use browser X instead some of them will convert - not surprisingly it wasn't a significant amount. The idea here was to get enough people to convert so that one or all of the following could happen: - web pages using non standard code would need to be revised or just not work correctly - I.E. would have to become standards compliant - Alternate browsers would gain momentum In the end it all comes down to the fact that the majority of users do not care what the standards are. They only care if their browser works with the least amount of effort possible. I.E. is still heavily integrated into Windows and it involves "effort" to change browsers on Windows. Although minimal to even the most novice TR reader - more than most are willing to do. Easier to just not visit the page again.

pgit
pgit

...complete with boilerplate code to do it... EXTREMELY obnoxious). I bet. I wasn't aware of this 'movement.' I would never imagine people could be so petty and short sighted. Was that all there was to it? Just befuddling IE users? Or was it part of a larger crusade? Did this have a name or battle cry?? I wonder what it would take to make reporting the agent fluid, that is first query the server to ascertain the site developer's browser of choice, then offer that one up...? (looking at the previous sentence now I'm thinking WAY too much work, if not impossible)

Justin James
Justin James

Some folks try to be cute, and serve up entirely different HTML and/or JavaScript & CSS to accomplish browser-specific things. The content itself is usually identical. But then you get people who are on an anti-IE crusade, and serve a page to IE that basically says, "go use a standard browser if you want to see this page" (this was an actual movement a few years ago, complete with boilerplate code to do it... EXTREMELY obnoxious). Imitating Chrome or Safari is probably the safest thing to do (the people who do user-agent based switching seems to overwhelmingly discriminate against IE). I *do* agree, 100%, that using a non-standard user agent can burn you from time to time, and thankfully, imitating a browser's user agent is much less likely to burn you that making up your own, but sadly, the potential is there regardless of what you do. :( J.Ja

Justin James
Justin James

The mini PDF just isn't happening, technical limitations, and it would still be in a tiny window. :( The other idea, using my host, isn't a bad one. I may just go back to putting stuff full up over there. I brought up this issue at a legal level ages ago, but it went nowhere, unfortunately. J.Ja

AnsuGisalas
AnsuGisalas

Viewable as an embedded image, and with the author-fixed-formating of a pdf. There ought to be.

pgit
pgit

what if the ONLY access to the specific URL is via the link in the article? It should be possible to have your server only accept links to those files originating from the TR referrer. (seems iptables could handle it) It would be their exclusive use, it wouldn't matter that the code is under MIT license. People copying it would say "I got this great snippet of free code from Tech Republic" rather than "..from Justin James." Wouldn't that be basically the only hook they could hang a hat on FOSS anyway? The effect would be "TR = helpful, valuable" etc. What more could they want?

AnsuGisalas
AnsuGisalas

How about, can you quote something without having rights to it? Under fair use, or something? Because if you can do that, you can likely also quote yourself without reassigning rights... eh, I dunno. So, to cut through the problem, we need a gang of knee-cappers to go talk to the tech crew at TR? Or would that mean war?

Justin James
Justin James

... I've put the full code up on my personal Web server. It's not a bad alternative, and it's always nice to feed myself some link juice. However, it brings up some licensing/copyright problems. You see, my contract assigns the contents of my articles 100% to TechRepublic (for the obvious reason... makes no sense to pay me to write something and then have me sell it to someone else too). The TechRepublic TOS has a blanket no-reproduce statement in it (except for certain exceptions, such as personal use). As a result, while I am positive no one would ever hammer me on it, I felt that I was skirting the contract + TOS by doing it, even if what I was legally doing was publishing the code under the MIT license myself, and then using a licensed copy for my articles, because then I'd be assigning TR rights that the MIT license forbade. J.Ja

pgit
pgit

You could put plain text code (copy and paste-able) on some service like pastebin and provide a link. You can name the link something that would maximize search hits. A lot more work than having a text window available to you writers integrated with the publishing software here on these forums, but at least there'd be code samples.

Justin James
Justin James

... it does NOT work out well. For one thing, with code samples, people cannot copy/paste the code. Search engines don't pick it up either (although that is a secondary, if not tertiary concern). And I've received complaints in the past that the screen captures are hard to read, because they don't automatically scale with people's settings. Believe me... I've tried every imaginable trick on the code samples, nothing has worked out the way it should. On the latest redesign, I begged them, *in person* at last year's TR event in Kentucky to give us the standard code sample box like every other site on the planet has. The TR staff is 100% behind us, I may add; they've been pushing for years on this too. They know that this is a big problem. The issue is the tech team, I do not know how they prioritize or authorize requests, but this request has never been addressed in the 5+ years that I've been writing for TechRepublic, and in fact, as they redesign and keep making the content area more narrow, it's actually gotten worse. J.Ja

Editor's Picks