Remedial XML for programmers: Basic syntax

Missed the XML boat? Get started with the basics. This article kicks off a three-part series that will help you develop your XML skills.

Maybe you've been stranded on a desert island talking to volleyballs, hiding in a cave, or simply avoiding all "Web stuff." Whatever the reason, you're lacking in XML savvy and want to remedy the situation. Well, you're in the right place. In this first installment in a three-part series, I'll introduce you to XML and its basic syntax. Later articles will cover data validation and using XML via a parser.

So what's the big deal?
I once worked with a developer who considered XML to be redundant. He asked me once, "We've already got HTML, which works fine. Why do we need another markup language?" He was, unfortunately, missing the point. HTML is exclusively a presentation language, making it possible, browser incompatibilities and proprietary extensions aside, to view the same data in the same way on multiple platforms.

Despite the fact that most Web browsers are inherently capable of displaying XML (the centrally confusing fact for my friend the confused developer), the language actually has nothing to do with displaying data. Instead, imagine a way to store data and describe the data's context at the same time, and you've got XML.

It's this ability to combine data with information describing its structure that makes XML so incredibly useful as a data exchange technology. For example, take two applications that store data in their own proprietary formats and try getting them to play nice and talk to each other. Most of your time on such a project would be spent designing and coding the mechanism used to transform data from application A's format to that used by application B. XML and its attendant technologies are ideally suited to solve such a problem, with minimal effort on your part.

Your basic XML document
Listing A shows a canonical example of an XML document describing a list of books. Incidentally, information in XML format is typically referred to as a document regardless of whether it's actually housed in a file on disk.

The first thing you'll notice is that XML is tag-based. If you've ever looked at HTML before, it shouldn't be too disturbing for you. Unlike HTML, however, the tags don't necessarily have a predefined meaning. Instead, they are simply markers for data. Here are a few things that might not be evident just from inspecting a document:
  • ·        All tags must be properly terminated with an end tag, like this: <author>…</author>.
  • ·        Tag names are case-sensitive. The following would generate an error: <Author>…</author>.
  • ·        Empty tags may be defined as well and can be expressed in one of two ways: <comments/> or <comments> </comments>.
  • ·        Tag data should always be enclosed in matching single or double quotes.
  • ·        The construct of tag, data, and end tag is referred to as an element.
  • ·        An element may contain other elements, as long as the end tags of the contained elements come before the end tag of the containing element.
  • ·        Every XML document must have only one root element that contains all other elements in the document.
  • ·        Tags beginning with an exclamation point typically indicate a directive for the XML parser to execute, while tags beginning with a question mark are reserved for the header or prologue of a document.
  • ·        The only exception to the previous rule is the comment tag. Comments are ignored by the parser and are enclosed in comment tags: <!— This is a comment —>.

Think of an XML document as a tree, with a single root that contains all the other elements. Internet Explorer displays XML documents in this format automatically, as you can see in Figure A:

Figure A
Opening an XML file in IE can help you visualize the document structure.

Every XML document should begin with a header or prologue defining any additional information needed to make sense of the data described. A long list of optional things may appear here, and if used, they must appear in a particular order. You'll always see at least a version declaration, which must come first:
<?xml version="1.0"?>

The wonderful world of attributes
Still with me? Now, I'm really going to bend your mind and talk briefly about attributes. An attribute is a name and value pair that can be associated with an XML tag to provide additional information about the tag. Here's an example of an attributed tag taken from Listing A:
<book id="bk101">

This snippet defines a unique identifier for the book described by the current book element. That's what attributes are meant to do—provide additional information about an element without requiring an element to store that information.

You may be asking, "So why couldn't you include that identifier attribute in the book element itself as its own id element?" And a lot of people would agree with you. Generally, attribute use is encouraged only when the information modifies the element but isn't specifically part of the element's content. In this case, the id attribute probably corresponds to a key in the database table that houses the book information. In that case, it's not likely to be modified and would probably be needed only when updating the underlying table, making it a prime candidate for inclusion as an attribute of the book element. Other uses for attributes come into play when you get into data validation and transformations.

Data is as data does
I should point out here that XML makes no preconceptions about the data you store in an element, nor the number or order of elements in a document. For instance, referring back to Listing A, there's really nothing to prevent me, troublemaker that I am, from sticking the author's name in the publish_date element. That's because in its most basic form, XML describes only the structure of the data it contains, not the format that data should take.

If you want to enforce some kind of order in an XML document, which is generally a good idea (especially if I'm around), you can provide either a Data Type Definition (DTD) or an XML Schema for your document. Both of these techniques will be the subject of the next article in my remedial XML series.


Editor's Picks