XML subscribes to formal document descriptions that really nail down syntax, structure, and content. It has been used as much for text-oriented documents as it has for many kinds of record-structured data, but in many cases, formalizing and documenting data is far easier than formalizing and documenting document structures. The key is to let the structure of the data drive the way XML is used to represent it, tempered by the ways in which object content must be further parsed and used.
To anyone with a database background, learning and using XML is usually easy. In fact, an increasing number of database engines, including Oracle and SQL Server, can routinely produce and use XML descriptions of data at many different levels. Let's take a look at how a simple, structured form of data works when you're mapping data structures to XML.
A few considerations
Let’s consider the address records in Microsoft Outlook’s Contacts folder as our structured form of data. Figure A shows my personal contact information captured as a screen shot. I’ll use it as a simpleminded source of information to explain how to map from record or data structures to XML.
To begin, let's review a few points about XML:
- All XML documents require a container object; in this case, let’s call it <address-book>.
- Although objects in XML can nest more or less arbitrarily within any given container object, structural occurrence indicators govern whether a contained object must appear or may appear. If an object does appear, it’s also possible to specify how many times it may or must appear. Figure B shows the objects that nest within the Full Name object in Outlook, because the figure explodes a more complete view of the Full Name object’s contents.
- Numerous factors come into play when you're deciding if information associated with an object should be an attribute (that is, a data value associated with an object that is not part of that object’s content), a nested contained object (and therefore part of the object’s content), or object content outright.
- The great beauty of any XML document is that the Extensible Stylesheet Language Transformations (XSLT) permit it to be transformed into other XML forms as needed and at will. If you don’t get things right on the first try, or if your data changes its structure or content, you can adjust your XML document definitions to follow suit, and you can turn the old form into the new form with relative ease. It is, in fact, much easier and more straightforward to do this than it is to export a database from one schema, massage the resulting data, and import it into some other schema.
|Objects nested in the Full Name object in Outlook|
Here’s a reasonable version of my contact record in XML format:
<Street>2207 Klattenhoff Drive</Street>
What is interesting about using XML to capture this kind of information is that occurrence indicators quickly reveal that a valid Address record object needs to contain only a name to be valid. The Address record won’t be useful for e-mail unless there’s at least one e-mail address defined, and you can't call a person identified in a record without a phone number, but Outlook will happily let you skip or avoid many fields because they’re optional, not required. This also explains why careful analysis of data is often required to flush out a full set of containers and contained objects to permit the most comprehensive XML to be described and used.
Notice also that container objects in an Outlook contact record often appear as buttons in the interface. Clicking the button reveals the underlying objects contained (as in Figure B), where the various component name fields in a Full Name object are explicitly revealed. The same is true for the Address record, where the street address, city, state or province, ZIP or postal code, and country information may also be captured (or supplied by intelligent defaults).
The trick to deciding how object content should be represented is understanding how that data is likely to be used. The Full Name record, for instance, breaks down nicely into various contained objects so that users can search on first or last name when looking for specific records. When deciding if object data should be an attribute or content (which also includes contained objects), the key is in recognizing whether that data is useful by itself to users. If so, it’s better treated as content than attribute. If it’s of interest to only developers or applications, it can safely be treated as an attribute. The decision about whether to break an object’s content into named objects or simply to supply it as undifferentiated text or other values depends entirely on whether that content has additional structure that needs to be captured.
A more sophisticated implementation of the phone number data in this kind of XML might further differentiate between area codes, local exchanges, or suffixes. Or, if international support is needed, you might take a more sophisticated look at phone number structure and include dialing strings, country codes, area codes, and various ways to represent local telephone number structures.