Over the last year or so, I have become involved with the W3C's HTML 5 Working Group. As a result, I have gained a new level of insight into not only the HTML 5 specification, but the HTML 4 specification as well. Along the way, I have learned a few things that I found quite surprising, despite having many years of experience in authoring Web pages and trying to write valid HTML code.
<DOCTYPE> does not matter
I know... this one is shocking. After all, doesn't the validator use the <DOCTYPE> to determine which rules to validate the document? Yes, it does. The implication of this is that the browser does the same; it looks at the <DOCTYPE> and makes a determination based upon it regarding which parsing rules, presentation mechanics, etc. are applied to the document. The reality is, the browser completely ignores <DOCTYPE>! If you are using XHTML, this may be a different story because XHTML is more rigid, but I would not be surprised if this were not the case. If you have been using <DOCTYPE>, and you're hoping to trigger a browser to do certain things based on it, don't bother. What happens instead is that the browser has its built-in HTML rules, and it applies the rules to all documents, regardless of <DOCTYPE>.
Browsers are much more compliant than you think
One of the persistent complaints about various Web browsers (usually Internet Explorer) is that, because they are not fully compliant with the HTML spec, they render things very differently. The reality is, browsers are much more compliant with the HTML spec than you think. And those obnoxious variances in rendering? The HTML spec is generally silent as to how things should render in a Web browser. There are some "should" and some "should not" items regarding rendering in the spec. But chances are, if one browser doesn't do what you expect it to, if you check the HTML spec, it is probably silent on the matter. Those differences in error handling and how invalid HTML is dealt with? Again, the current HTML spec is pretty quiet about those things. Blame HTML 4, not the browser. Luckily, HTML 5 is much more specific on these points, which should alleviate much of the differences between browsers.
Compliance does less than you think
The other side of the compliancy coin is that a browser's compliance with the HTML specification is little guarantee that your document will display the way you expect it to. Because so much of the current HTML specification is worded with "should" and some "should not" (as opposed to "must" and "must not"), browser vendors have a lot of leeway in how things are done, while still remaining compliant with the spec. Again, HTML 5 is working hard to make these things more clear and to reduce these kinds of issues.
Validators cannot cover all of the bases
Another favorite tool of many Web developers is the trusty HTML validator. What many programmers don't realize is that aspects of the specification are not machine verifiable! In most cases, the validators can work, but in some they cannot. There are a number of instances in the specifications where whether something is allowed is based on circumstances that a computer just is not well suited to detecting. So, yes, keep using your validator, but be sure to check your code against the specification too when possible.
The <q> tag
The <q> tag is supposed to be used for an inline block of text that represents quotes. An analogy is: "<blockquote> is to <div> as <q> is to <span>." For a number of reasons, <q> is not widely used. When quotations are marked up with HTML, it is usually with the much more popular <blockquote> tag. So what is so strange about <q>? As a tag, it is supposed to do something that no other tag does: It is supposed to automatically surround its contents with quotation marks. This behavior is not well specified in HTML 4, and Internet Explorer never implemented it. As a result, authors who used <q> have traditionally put in browser detection to add the quotation marks in Internet Explorer. In Internet Explorer 8, its behavior will be updated to comply with the spec as well. While this is good overall, it will break the code with browser detection. While the idea of a tag adding punctuation may be unusual (as far as I know, <q> is the only tag that does it), the authors currently using <q> know enough to expect it.
This is just a sampling of some of the oddities in HTML that I have found. I hope that you find this information useful!
(For more insight into HTML 5, check out my interview with HTML 5 Editor Ian Hickson.)
J.JaDisclosure of Justin's industry affiliations: Justin James has a working arrangement with Microsoft to write an article for MSDN Magazine. He also has a contract with Spiceworks to write product buying guides.
---------------------------------------------------------------------------------------Get weekly development tips in your inbox Keep your developer skills sharp by signing up for TechRepublic's free Web Developer newsletter, delivered each Tuesday. Automatically subscribe today!
Justin James is the Lead Architect for Conigent.