Working with XML entities

XML entities are often overlooked in the XML dialect, but they provide a powerful vehicle for XML developers. Learn how to effectively use them in your DTDs as placeholders or to retrieve external data.

An XML entity can play several roles, such as a placeholder for repeatable characters (a type of short-hand), a section of external data (e.g., XML or other), or as a part of a declaration for elements. Schemas have their own mechanism for handling entities. I have seen some examples that use schemas to define the XML structure and mix them with DTD-style entities.

Entities have been used in most published DTDs and schemas to organize repeating values, represent special symbols (e.g., formatting characters and copyright symbols), or to define parameters for elements. You may not even realize that you've worked with XML entities, because certain entities are incorporated into HTML (e.g., using   for a nonbreaking space).

The XML specification contains only five entities but gives developers and architects the ability to create their own entities to use throughout the current XML document. In XML, entities follow the same pattern used in HTML or SGML: an ampersand (&) followed by the name of the entity, and then a semicolon (;).

Entities used as placeholders
The sample below uses an internal DTD. In this scenario, an online mail system requires stores to submit this document, which will be processed into a larger system. A store has a name, phone number, some promotional text, and a logo.

Using entities, if the store's name changes, the XML editor or application doesn't need to parse through the entire document to find each time the name is referenced, because only the storeName entity needs to be changed. After you create the XML file, it can be uploaded to a server to be processed. The store owner can also submit a much simpler file containing text for the store's promotion or tag line. If this information changes, you need only submit a new text file to the system like this:
<?xml version="1.0"?>
<!DOCTYPE store [
<!ELEMENT mall (store)>
<!ELEMENT store (name,phone,promo,image)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT promo (#PCDATA)>
<!ELEMENT image (#PCDATA)>
<!ENTITY storeName "Sample Store">
<!ENTITY storePromo SYSTEM "./StorePromo.txt">
<image width="200">logo.gif</image>

Entities to retrieve external data
When working with XML documents, there comes a time when one XML document needs to include another. There are many techniques you can use to do this (e.g., XSL and XPointers), but entities are probably the most straightforward and most supported way to set it up. Note that the text should be encoded to be valid in an XML document (don't use & or < characters). In the above example, the store's promotional text can be in any format, including an ASCII text file (the code in the previous section works only with a text file), a .gif image, or included XML as in the example below:
<tagline>The best prices in town</tagline>
<description>Check out our prices during this week's sale</description>

The above elements should also be defined in the main XML document's DTD or schema. When processing the document, the tag line can be accessed from the DOM tree /mall/store/promo/storePromo/tagline.

Entities that help define the structure of a document
Parameter entities are a different mechanism altogether; they allow portions of the XML document to be defined as shortcuts to an element's parameter list. As DTDs grow, element definitions can become quite complicated. Parameter entities cannot be defined and used in an internal DTD as in the examples above.

Unlike the previous example, to work with parameter entities, a separate DTD file needs to be used. The original store declaration is simple:
<!ELEMENT store (name,phone,promo,image)>

However, when you add a type to the store element, the declaration becomes more complicated. Add a few more parameters, and deciphering the element would become difficult. There is a trade-off when using parameter entities: Either the element declarations become increasingly complicated or the document structure becomes more complicated because it must manage entities and elements. Most people choose to use entities because they make it easier to manage larger DTDs with complicated elements, and the parameter entities can be reused for similar elements in the document. At first, the type parameter added to the store element might look like this:
<!ELEMENT store ((retail | food | concourse),name,phone,promo,image)>

Using parameter entities (prefixed with a %), a type entity can be created. Then this entity can replace the parameter list. The %type; entity needs to be declared as follows:
<!ENTITY %type "(retail | food )">

The store element declaration now returns to a somewhat more readable list like this:
<!ELEMENT store (%type;, name,phone,promo,image)>

The final document is listed below. This time, I also used an XML document for the promotional tag line instead of text, like so:
<?xml version="1.0"?>
<!ENTITY % type "(retail|food)">
<!ENTITY % boolean "(true|false)">
<!ENTITY phonePrefix "ph:">
<!ENTITY storePromo SYSTEM "./StorePromo.xml">
<!ELEMENT store ((%type;),name,phone,promo,image)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT promo (text)>
<!ELEMENT image (#PCDATA)>
<!ELEMENT retail (#PCDATA)>
<!ATTLIST image
nosave %boolean; "true"

Using attributes
Similar to using entities, attributes can supply values that are missing from the document. Attributes are a different mechanism and can cause confusion. You can define a set of valid values for an attribute and store them as a parameter entity. In the example above, the %boolean entity handles any attribute that needs a true/false pair. Again, this works only when using a separate DTD file. Attribute defaults can be useful with pictures or prices, such as shipping charges (i.e., if the shipping is empty, you default it to a standard charge).

It takes some time to get used to implementing entities with XML, but keep them in mind when defining XML documents and projects. Also, if you use them, it's a good idea to keep a list for reference or use a DTD parser to keep track of the DTD design.

Editor's Picks