Javascript is simultaneously the most ever-present and most useful tool for a Web application developer. The real-time, interpreted nature of using Javascript in the browser means that you have to be extremely careful when dealing with the Document Object Model (DOM).
The DOM is the data structure your browser uses to record Web pages. Structurally it’s a tree of tags, each tag may contain multiple tags below it, corresponding to the format of a HTML (or XML) document. You can use Javascript to examine or change the contents of the DOM dynamically after a page has been loaded — this is one of the major foundations of AJAX.
Moving around the DOM in Javascript can be tricky, however, as all references in Javascript are dynamic, which means they can change while you’re using them. The intuitive algorithm in many cases will not work, especially if you’re used to the behaviour of languages which allocate extra storage and cache variables when you change scopes.
I’ll illustrate this with a short example. We’ll write a function in Javascript which replaces the current page with a text representation of the DOM tree, letting you see what’s under the hood of the page. For the example we’ll use the following example page:
<html> <head><title>Test Title</title></head> <body> <p>Test Text</p> <p><a href="http://builderau.com.au/">BuilderAU</a></p> </body> </html>
Now we’ll add a <script> element at the top of the file containing the function printDOM, which should print out the DOM. First set your body tag to run this function on the document when loaded:
<body onload="printDOM(document)">
Now let’s take a first attempt at the function:
function printDOM(x) { document.write(x) for (var i = 0; i < x.children.length; i++) { printDOM(x.children[i]); } }
This should print the following:
[object HTMLDocument][object HTMLHtmlElement][object HTMLHeadElement][object HTMLBodyElement][object Text]
On first glance this appears to be working, however we’re missing the biggest part of the document: all of the <p> and the link. The reason is simple: as we traverse the DOM, we change it by writing to it. Therefore by the time we get to examining the children of the HTML elements, they have changed and now contain the data that we’re writing to the document. The lesson here is that you cannot write to the document and examine its old contents at the same time.
We can fix by printing the DOM in two passes. Firstly we store the data in a secondary array on the initial pass, and then loop through this array, writing to the document as we go:
function traverseDOM(x, arr) { arr.push(x); for (var i = 0; i < x.childNodes.length; i++) { traverseDOM(x.childNodes[i], arr); } return arr; } function printDOM(x) { arr = Array(); arr = traverseDOM(x, arr); for (var i = 0; i < arr.length; i++) { document.write(arr[i]); } }
This gives us the following output:
[object HTMLDocument][object HTMLHtmlElement][object HTMLHeadElement][object HTMLScriptElement][object Text][object HTMLTitleElement][object Text][object HTMLBodyElement][object Text][object HTMLParagraphElement][object Text][object Text][object HTMLParagraphElement]http://builderau.com.au/[object Text][object Text]
The problem here is that the output is a mess, and there is no indication of which tags are children of other tags. Ideally we’d like to insert some formatting tags when we print out the tree. We could solve this by having the first pass array become multi-level for each tag’s children — but this would complicate the printing. There is an easier way.
Since we are writing directly to the document, we can take advantage of the fact that the browser knows how to interpret text containing tags. That is, if we write a tag to the document it will be interpreted rather than being directly printed.
So, for example, if we add definition list elements to our array on the first pass they will format the resulting document:
function traverseDOM(x, arr) { arr.push("<DL><DT>") arr.push(x); for (var i = 0; i < x.childNodes.length; i++) { arr.push("<DD>"); traverseDOM(x.childNodes[i], arr); } arr.push("</DL>"); return arr; } function printDOM(x) { arr = Array(); arr = traverseDOM(x, arr); for (var i = 0; i < arr.length; i++) { document.write(arr[i]); } }
We get the resulting text in our document:
[object HTMLDocument] [object HTMLHtmlElement] [object HTMLHeadElement] [object HTMLScriptElement] [object Text] [object HTMLTitleElement] [object Text] [object HTMLBodyElement] [object Text] [object HTMLParagraphElement] [object Text] [object Text] [object HTMLParagraphElement] http://builderau.com.au/ [object Text] [object Text]
That’s closer to what we were after in the beginning. Often it’s not as simple as you would think to modify the document in Javascript. You have to be careful that you’re not changing the assumptions (in this case the loop invariant) you made when writing the function. It’s important to be acutely aware that changes you make are immediately propagated to all parts of your script, or you can get stuck searching for information that is no longer there.