Browser

Five HTML oddities that you may not know

When working with HTML, Justin James says browsers are much more compliant than Web developers think. Find out the other surprising information he has learned about HTML since becoming involved with the W3C's HTML 5 Working Group.

 

Over the last year or so, I have become involved with the W3C's HTML 5 Working Group. As a result, I have gained a new level of insight into not only the HTML 5 specification, but the HTML 4 specification as well. Along the way, I have learned a few things that I found quite surprising, despite having many years of experience in authoring Web pages and trying to write valid HTML code.

<DOCTYPE> does not matter

I know... this one is shocking. After all, doesn't the validator use the <DOCTYPE> to determine which rules to validate the document? Yes, it does. The implication of this is that the browser does the same; it looks at the <DOCTYPE> and makes a determination based upon it regarding which parsing rules, presentation mechanics, etc. are applied to the document. The reality is, the browser completely ignores <DOCTYPE>! If you are using XHTML, this may be a different story because XHTML is more rigid, but I would not be surprised if this were not the case. If you have been using <DOCTYPE>, and you're hoping to trigger a browser to do certain things based on it, don't bother. What happens instead is that the browser has its built-in HTML rules, and it applies the rules to all documents, regardless of <DOCTYPE>.

Browsers are much more compliant than you think

One of the persistent complaints about various Web browsers (usually Internet Explorer) is that, because they are not fully compliant with the HTML spec, they render things very differently. The reality is, browsers are much more compliant with the HTML spec than you think. And those obnoxious variances in rendering? The HTML spec is generally silent as to how things should render in a Web browser. There are some "should" and some "should not" items regarding rendering in the spec. But chances are, if one browser doesn't do what you expect it to, if you check the HTML spec, it is probably silent on the matter. Those differences in error handling and how invalid HTML is dealt with? Again, the current HTML spec is pretty quiet about those things. Blame HTML 4, not the browser. Luckily, HTML 5 is much more specific on these points, which should alleviate much of the differences between browsers.

Compliance does less than you think

The other side of the compliancy coin is that a browser's compliance with the HTML specification is little guarantee that your document will display the way you expect it to. Because so much of the current HTML specification is worded with "should" and some "should not" (as opposed to "must"  and "must not"), browser vendors have a lot of leeway in how things are done, while still remaining compliant with the spec. Again, HTML 5 is working hard to make these things more clear and to reduce these kinds of issues.

Validators cannot cover all of the bases

Another favorite tool of many Web developers is the trusty HTML validator. What many programmers don't realize is that aspects of the specification are not machine verifiable! In most cases, the validators can work, but in some they cannot. There are a number of instances in the specifications where whether something is allowed is based on circumstances that a computer just is not well suited to detecting. So, yes, keep using your validator, but be sure to check your code against the specification too when possible.

The <q> tag

The <q> tag is supposed to be used for an inline block of text that represents quotes. An analogy is: "<blockquote> is to <div> as <q> is to <span>." For a number of reasons, <q> is not widely used. When quotations are marked up with HTML, it is usually with the much more popular <blockquote> tag. So what is so strange about <q>? As a tag, it is supposed to do something that no other tag does: It is supposed to automatically surround its contents with quotation marks. This behavior is not well specified in HTML 4, and Internet Explorer never implemented it. As a result, authors who used <q> have traditionally put in browser detection to add the quotation marks in Internet Explorer. In Internet Explorer 8, its behavior will be updated to comply with the spec as well. While this is good overall, it will break the code with browser detection. While the idea of a tag adding punctuation may be unusual (as far as I know, <q> is the only tag that does it), the authors currently using <q> know enough to expect it.

This is just a sampling of some of the oddities in HTML that I have found. I hope that you find this information useful!

(For more insight into HTML 5, check out my interview with HTML 5 Editor Ian Hickson.)

J.Ja

Disclosure of Justin's industry affiliations: Justin James has a working arrangement with Microsoft to write an article for MSDN Magazine. He also has a contract with Spiceworks to write product buying guides.

---------------------------------------------------------------------------------------

Get weekly development tips in your inbox Keep your developer skills sharp by signing up for TechRepublic's free Web Developer newsletter, delivered each Tuesday. Automatically subscribe today!

About

Justin James is the Lead Architect for Conigent.

63 comments
Holod
Holod

Why do you think that < q > is not widely used. I always try to use tag < Q > to highlight quotations in the text, because according to html tutorials the content of the package is imaged in browser in quotes. Internet Explorer understands the text, but does not quote. And it is very importent for me.

jimmyreed4tech
jimmyreed4tech

Would very specific "browser detection" code not have to be updated when a new version of IE is released?

Spiritusindomit
Spiritusindomit

But fanboys don't want to hear it, and I don't envy the ranting you'll have to endure.

ken
ken

What a waste of time. Why even write an article that contains so little information, especially since the first bit is wrong? The q tag? Hasn't it been depricated along with blockquote? And your analogy about blockquote is to div as q is to span doesn't work.

fjanon
fjanon

I wish there was a "Thumbs Down" for this post.

mauco
mauco

So what's the conclusion should we continue using DocType or not?

Beauregard T. Shagnasty
Beauregard T. Shagnasty

..all make a great difference. If you use a proper (hopefully Strict) DOCTYPE, and your document validates at http://validator.w3.org/ and http://jigsaw.w3.org/css-validator/validator.html you stand a much better chance for the same display among browsers. The reason is due to the way different browsers handle error correction. If there are no errors, the browser does not have to guess what you really wanted. You mentioned XHTML. Have you ever used it *served properly* as "application/xhtml+xml" instead of the HTML "text/html"? Go ahead, test versions of Internet Explorer here: http://tekrider.net/html/doctype.php

martinlatter
martinlatter

DOCTYPE is very important for Internet Explorer. The quirks mode it triggers or bypasses can cause great differences in web page layouts. I did the following code for IE6 in 2003. Including or removing the (non-XHTML) DOCTYPE continues to make all the difference in IE7. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>Index page for mockups</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <style type="text/css"> <!-- body{background:#fff;font:13px/24px verdana,arial,sans-serif;} a{display:block;text-decoration:none;} a:hover{background:#ccf;} div#cont{margin:60px auto 0 auto;width:150px;} p#title{text-align:center;font-weight:bold;color:#06c;} p#menu{border:4px double #99f;padding:15px;margin:0 auto 0 auto;width:130px;} p#menu a{color:#336;} --> </style> </head> <body> <div id="cont"> <p id="title">Index for mockups</p> <p id="menu"> <a href="google_bar.html">Google choices</a> <a href="oa.html" style="font-style:italic;">OA magazine</a> <a href="advsearch.html">Advanced search</a> <a href="info.html">Info page example</a> </p> </div> </body> </html>

rdcpro
rdcpro

The browser imcompatibilities and spec problem are not so much about HTML, but about how the browsers do or do not handle CSS. HTML is silent on formatting issues for a good reason. Your HTML markup should be semantic, and you deal with formatting using CSS. Regards, Mike Sharp

moyashi
moyashi

DOCTYPE makes a huge difference in how CSS renders, especially switching between some sort of html4 and some sort of xhtml1.

mattohare
mattohare

It's easy to microsoft/firefox-bash over standards issues, but i couldn't believe that either could be as 'wrong' as it seemed. I also had some clue when have the 'standards' we use are called 'recommendations'. (Not all that different from all of the 'beta' web applications that never seem to get to production. One question I have about DOCTYPE. We all know how traditions and practices can evolve. I can't help but expect a current or future browser to suddenly start using it when available. We know, with evolving web applications, that http and html limit us quite a bit. I think I'll keep the DOCTYPE tags in case some new doctype comes along and goes into these browsers.

zdnet
zdnet

Some CSS stuff doesn't work properly if you don't have the DOCTYPE because then IE goes into 'quirks' mode. My favorite quirk was always the one where Mozilla wouldn't render anything if the table tags weren't perfect (if you had an extra TD or forgot to close one). I spent many hours on pages over that bug.

davids
davids

If your using tables for page layout, the table attribute 'height' doesn't work if the Doctype is strict.     In IE or Firefox, this code will display a blue bar across the bottom of the screen. It will look the same if you add a 'Transitional' doctype. If you add a 'Strict' doctype, the blue bar will appear just below the top of the page.

Beauregard T. Shagnasty
Beauregard T. Shagnasty

..especially if you only sniff for versions of IE. What of those using other browsers? If you write only for an intranet, you may have control over what your visitors use, but for the WWW, why would you want to limit visitors? Case in point: my wife uses Hotmail for some of her emailing. She uses the latest version of the SeaMonkey browser on Ubuntu. About a month ago, a page appeared at Hotmail login advising her to upgrade her browser to something modern - Internet Explorer, Firefox, or Safari. Then earlier this week Microsoft changed something else and she can no longer reply to any email, nor select Plain Text or RTF. In effect, her account has become useless. We also tried using the latest Firefox and Opera with Ubuntu. Doesn't work, not even if spoofing the UA string. I dusted the cobwebs off an old PIII with W2K/IE6 and that doesn't work either. Browser sniffing is doomed to failure, and annoys your visitors - who will go shop somewhere else. :-/

Justin James
Justin James

To be honest, I'm not sure. It would depend on the code, and how IE decides to report itself. I will say this, from what the IE team has said on the list, I would be VERY cautious about any browser detection script which couldn't tell IE 8 from the previous versions. J.Ja

Justin James
Justin James

q is not deprecated. And the analogy is quite correct. Blockquote is a block element, as is div. q is an inline element, as is span. Please explain how that is wrong. J.Ja

Beauregard T. Shagnasty
Beauregard T. Shagnasty

Actually, neither nor are deprecated elements. Note that appears on these lists as wrongly used solely for indentation, but not for its true purpose of displaying quoted material. http://www.nysforum.org/accessibility/resources/nyspolicy/deprecated-discouraged.html http://webdesign.about.com/od/htmltags/a/bltags_deprctag.htm http://en.wikipedia.org/wiki/Blockquote blockquote and div are both block-level elements; q and span are both inline elements. Otherwise, there is no relation. :-)

Justin James
Justin James

As others have said, DOCTYPE is important, but not for the reasons that you would think that it should be. It does trigger "quirks mode", "standards mode" etc. on browsers, but it does *not* tell the browser specifically, "parse this according to the spec that I indicate". That's the worst part of the DOCTYPE mess, as far as I am concerned. It has the appearance of doing one thing, and that is *not* done, while something similar is actually what happens. J.Ja

Justin James
Justin James

You are right on about XHTML. It is *often* served as text/html, which causes differences in how it is handled. As you point out, the error handling will be significantly different, and that's just the start. J.Ja

Justin James
Justin James

The point is that in *HTML*, the DOCTYPE is not really used. It is a "magic talisman", or a "lucky charm". However, browsers choose to enter various modes of interpretation based upon DOCTYPE, that do *not* align 1-to-1 with actual HTML specs, and are more along the lines of CSS handling and default stylings than anything else. J.Ja

Justin James
Justin James

Mike - Base on the confusion, I worded it unclearly. Your summary is much better than mine! J.Ja

maharawj
maharawj

As the above comments proves, that our man is talking about html oddities not CSS, We jump to the conclusion that things are displayed differently because of the doctype and it is important. Doc type is important only to CSS. If there is something wrong to a plain html document then it is because html is "silent" about that issue. When I first read this article i was WTF!. but then this comment help me understand it.

Justin James
Justin James

That's very true. But it really doesn't make much of a difference on how the HTML itself gets parsed, for better or for worse. J.Ja

Jaqui
Jaqui

if you are using XHTML instead of HTML. the xml foundation of XHTML doesn't work well with IE < 7 unless you throw it into quirks mode. and then, the quirks mode trigger resolves many issues if you use an xml version=1. declaration before the doctype.

Justin James
Justin James

... what happens is that based upon the DOCTYPE (and some other factors, I beleive), the browsers implement things like "quirks mode", "standards mode", and "almost standards mode". Yup, that's right... three different "modes" for interpreting a document! It's completely insane. One would *think* that putting in a DOCTYPE would cause the document to be interpreted according to that specification, but it doesn't. As was said in the HTML 5 email list, DOCTYPE is a "magic talisman". :( So yeah, in a way it makes a difference, just not a logical (nor a *definined*) difference. J.Ja

Justin James
Justin James

I am very against it as well. I've accepted the fact that certain things will look a bit different on different browsers, it's not the end of the world to me, since it is usually whitespace/spacing items, and since I use liberal amounts of whitespace in my designs anyways, it doesn't hurt them. I gave up on the idea of having precision control over HTML presentation a long time ago, and I learned to work around it. Since I also avoid client-side scripting (except for minor things that are also done server side like validation, or minor pieces of "bling" that don't matter if they don't work), differences in how things are done in JavaScript don't bother me either. That being said, I know *why* people do sniffing. But it comes down to a philosophical, personal opinion of "what is HTML well suited for?" I don't think HTML is a great base for applications (Flash, Silverlight, native code, etc. are better for that) and I don't think that it is a good choice for precision in layout (PDF is the gold standard there), so I try to use it for documents with a modicum of user interaction. But that's all personal opinion, and I know that most developers do not have the luxury to take this stance, based on what their bosses are demanding they do. :( J.Ja

Beauregard T. Shagnasty
Beauregard T. Shagnasty

I typed blockquote in that post with the left and right brackets surrounding the word, out of habit. Please translate where each indent occurs is the word 'blockquote'. Heh.

mattohare
mattohare

I think the article brought up some very important issues. Maybe JJa didn't word things in the best way, but he still worded it better than two-thirds of the technical books you'll find in Powell's Portland, Borders, Easons, Barnes & Noble, and Waterstones. Now, I wish we could get back to the topic of what other rendering quirks there are. There must not be any, if all people can do is flame about a wee mis-writing. Now, pardon me as I check who sells asbesdos suits. I think I need to have one sent to a friend of mine in the Carolinas.

chris
chris

what "HTML" things does anyone care about these days anyway? Most responses and issues are CSS. Of course we often think of them together somewhat. I am assuming everyone is writing XHTML compliant code here by the way. Why wouldn't you :-)

Beauregard T. Shagnasty
Beauregard T. Shagnasty

"Often"? XHTML is almost *always* served incorrectly as text/html. If authors used the proper DOCTYPE, 80% of visitors (Internet Explorer users) would not be able to view the page. It is not the display error handling.

Beauregard T. Shagnasty
Beauregard T. Shagnasty

Heh. Now I *know* you do not understand the subject. Magic talisman, indeed. :-/ There isn't anything in that comment which makes sense.

mattohare
mattohare

One that will define the 'recommendation' and what happens with errors and such? LOL

chris
chris

how people "think" something should be. Your dealing with that now. People hear "doctype doesn't matter" and apply that to the whole thing as a unit (the whole page being rendered) even though you meant it in a more specific way (part of the process). This IS the problem with HTML/xhtml/css standards too. Companies read the standards and implement them just a little bit differently because of the same thing. My personal peeve is how FF does padding in divs.

Justin James
Justin James

... that the document is being delivered with a Content-Type for XML, not text/html? That will make a difference too. The browser logic goes like... if (Content-Type == XMLType) { Parse as XML; } else { if (DOCTYPE matches 'Quirks Mode') { Parse as HTML/Quriks Mode; } if (DOCTYPE matches 'Almost Standards Mode) { Parse as HTML/Almost Standards Mode; } if (DOCTYPE matches 'Standards Mode') { Parse as HTML/Standards Mode } } So, depending on factors, you could get 4 different parsings in 2 major groups (the XML semantics group and the HTML semantics group). J.Ja

bmc1416
bmc1416

Don't browsers also go into quirks mode when Frontpage extensions are on the server?

ByteBin-20472379147970077837000261110898
ByteBin-20472379147970077837000261110898

You said in your article that browsers ignore DOCTYPE. And in your above post you say that they act on it by implementing different "modes". If they are implementing anything at all based on DOCTYPE then technically they are not ignoring it. I've had to use a DOCTYPE in order to get a page to display the same way in IE and in Firefox. Sometimes that's all that is needed to fix a misaligned page.

bradleyross
bradleyross

There are many things that the W3C says that you should do. You should specify a title in the head section You should use http-equiv="Content-Script-Type" content="text/javascript" to indicate that Javascript is your default scripting language When the page is first created, one member of each radio button set should be selected There are many more Doing these may or may not make a difference now but can prevent strange behavior in the future as things change

mattohare
mattohare

I made the business decision to use xhtml and css. Even wrote my own xml generator for this since the .net one had precious little documentation for building and writing the documents at the time. My old tool got lost in a massive harddisk crash last march, so I'm taking the opportunity to rewrite it inheriting the .net classes that now have docs to tell me how to use them. I'm proud to say that I made a major success on that front last night. Got the new tool to create its first 'hello world' type page!

Justin James
Justin James

"Ok, I did. Quite the strange thread, and not at all related to the current in-use version - HTML 4.01 - rather about a proposed HTML 5. That is going to be awhile, and no current widely-used browsers use it." Not true. HTML 5 is in use *now*. Firefox, Opera, and Webkit (Safari, Chrome, etc.) all are currently shipping with support for the current draft; tags such as the new audio and video tags, for example. IE 8 will be shipping with support for many of the items in the current draft. Google is already working hard to get aspects of the "Web Workers" (asynchronous processing system) implemented. Etc. I know, it's hard to beleive, but HTML 5 functionality is actually shipping today. "Meantime, please don't mislead authors by saying DOCTYPE in today's world is not important. Thanks for your consideration." I agree that the original post is not worded specifically enough, and as a result, threads like this got started. It *is* important, just not in how the HTML itself is parsed and interpreted as the semantic skeleton of the document (I am fairly certain that we agree here). For example, the blink and marquee tags do not exist in HTML 5. I bet that even if you mark a document with the HTML 5 DOCTYPE (not decided upon yet, by the way) and use blink or marquee, they will work. The fact that video and audio work in browsers that support them, despite there not being an HTML 5 DOCTYPE yet, is further evidence of what I am talking about. So yes, I agree that the original statements were misleading (although certainly not deliberately so!), but the message that I was trying to convey is valid. J.Ja

Beauregard T. Shagnasty
Beauregard T. Shagnasty

"I suggest you check out the thread on HTML 5 that gave me this information:" Ok, I did. Quite the strange thread, and not at all related to the current in-use version - HTML 4.01 - rather about a proposed HTML 5. That is going to be awhile, and no current widely-used browsers use it. quote: Justin James: > Thoughts? Am I off my rocker? Andrew Sidwell: In short, yes; /quote Meantime, please don't mislead authors by saying DOCTYPE in today's world is not important. Thanks for your consideration.

Beauregard T. Shagnasty
Beauregard T. Shagnasty

[If authors used the proper DOCTYPE] Sorry, I meant to type: If authors used the proper Mime-type. application/xhtml+xml Internet Explorer will just ask you if you want to "download the file."

mattohare
mattohare

Our local assembly is still trying to sort out charging for water service in a fair way without the expense of installing water meters. This could have them putting little internet pipes all over the province and each meter with its own ip address. LOL

Justin James
Justin James

It's how we end up with US Congressmen talking about the Internet's "pipes". :) J.Ja

mattohare
mattohare

There was an article in one of last week's newsletters about using metaphores to communicate technical stuff better. The conversation there seemed a bit fragmented, but this strand of this thread seems to address an issue I've had. I do use metaphores for very localised conversations. I avoid trying to use them in long term use because it simply replaces the technical terms with silly words that sometimes put people off. I use metaphore and analogy to define the technical terms for future use. That seems to keep such relationships on a more professional level. And, the non-tech stakeholders come away with a better understanding in general.

Jaqui
Jaqui

is text/xml though text/xhtml also gives correct rendering of the css. text/html doesn't

Jaqui
Jaqui

since frontpage extentions on the server allow for frontpage to interface with the server, and the server to handle the activex controls frontpage embeds into the site.

Justin James
Justin James

As deepsand pointed out, the variety of user agents (consumers of HTML that are not necessarily Web browsers) means that "may" and "should" are here to stay throughout the spec. Even amongst the genre of UAs known as "Web browsers", there is so much possible difference in capabilities, that "must" is pretty hard to put in there. Lynx comes up a lot as an example. "Oh, such and such tag 'must' be rendered like this be default? How exactly is Lynx going to do that?" Not that anyone thinks that Lynx is widely used, or that most Web pages will even be sensible in it as it is, but it is a good benchmark for non-browser UA behavior. There are also a lot of concerns around accessibility which "may" and "should" are often used to address. So yes, while it would be good in 90% - 95% of use cases to have "must" instead of "may" or "should", it's the small minority which drive the use of the more flexible wording. And on the Web, even 1% of pages is many millions of pages from hundreds of thousands of authors. Ah, the joys of a widely adopted spec! :) J.Ja

deepsand
deepsand

Browsers are not the only rendering agents that presently exist or will exist in the future. Therefore, any presentation standard must allow for the very kind of flexibility that you decry.

Jaqui
Jaqui

since it would be absolutely huge. far better to have two standards, one covering styling and one defining allowed content tags. both have to lose the "may", "should" and use must everywhere. remove options for how the elements are handled by the browsers and put required handling to make it easier for a site to work with all browsers.

mattohare
mattohare

Really, if there's one thing I learned even more from the discussion here is that CSS and HTML are two sides of the same coin. It's still one coin, a page that has to render and display properly and predictably. Stop the buck (or the quid) right here at one standard.

Justin James
Justin James

"Styling" requires the ability to position, but somewhere along the way, CSS having the ability to position because CSS being used to perform layout. HTML, unfortunately, has little capabilities to perform layout of its own, without using tables. You are right, it's a "fudge". HTML claims, "whoa, I'm a semantic language, all of that positioning and color stuff I picked up along the way doesn't belong here, give it to someone else!" Meanwhile, CSS is saying, "hey pal, if you want to make something look good, fine, I can do that, but I really don't think I should bear all of the responsibility... after all, it is HTML that defines how the elements relate to each other and nest and such!" It's a mess. I'd love to see it all go away in favor of something actually designed for this kind of work, and allow HTML to be relegated to documents and basic forms only role; anything needing supreme layout/styling control by the designer would be done in something else. J.Ja

mattohare
mattohare

I agree, strictness and well-formedness reduce the fudge factor. I find it never quite goes away. HTML5, in the end, will get there. We get the same standards management process around CSS, and I can see a total disappearance of the fudge factor. I think we still have a bit too much that's not firmly defined yet.

Spiritusindomit
Spiritusindomit

The whole damn thing is a thousand bandaids on top of TEX and the only reason HTML and XML still exist is web developers and people in general are complacent. Everything you do in HTML, XHTML, Javascript and any other web technology is at least 5 hacks deep and so convoluted no one even remembers how it got that way. We should have long ago made something better because there is still an enormous gap between web devs and real devs.

Jaqui
Jaqui

it is meant for styling not layout. the markup languages are meant for layout. The problems lie in the fudge space of the spec, in that it doesn't have a required handling of the elements. it's the capabilities of both (x)HTML and CSS combined that do control how any browser [i]should[/i] handle the elements. That is actually the one benefit to a "strict" xml doctype with a well formed stylesheet, the fudge factor for rendering the document goes away.

Justin James
Justin James

Layout is not HTML. HTML is a semantic language. As stated elsewhere in this thread, the fact that a browser decides to present a document differently based upon DOCTYPE does not meant that it interepted the HTML any differently. You are talking about a CSS problem, as evidenced by "float" and "fixed" in your post. :) J.Ja

ken
ken

The author of the article is so very WRONG! I just learned this lesson today, 11/18/08, that without a transitional doctype IE misinterprets WIDTHs incorrectly - at least if a FIXED element is next to a FLOATed element. DOCTYPE matters! The author should remove this item from the article.

Justin James
Justin James

You are right that I said that... it was poor wording on my part originally. I should have been more specific about saying what browser ignore. They seem to parse the HTML itself the same regardless of DOCTYPE (things like error handling, for example), but they apply different rendering rules, CSS, and other items that are defined outside of the HTML spec based upon DOCTYPE. Sorry about that! J.Ja

Editor's Picks