Developer

Solve complex problems with the mixed content element

You have to make difficult decisions when developing XML solutions, such as how to structure the data within your document. Here's one scenario and solution.

This article originally appeared as an XML e-newsletter.

By Brian Schaffner

When developing XML solutions, you're often faced with difficult decisions. These decisions frequently revolve around how to structure the data within your document. Choosing one solution makes sense in one situation, while another solution makes sense in another.

We'll examine one scenario where you might have to make this decision and demonstrate an elegant pattern that helps solve it.

The complex content problem

Imagine that you aren't entirely certain what kind of data you're going to put inside a particular element. In fact, on top of being unsure what the data is, you're also uncertain how the data is structured. This situation could create real problems for your document structure.

Let's take a look at an example. Suppose you have an element called Item in your XML document. For whatever reason, the data that goes into this element may come from a variety of sources (such as different divisions or a merger). Because the data comes from different sources, it may be in a predictable structured format, or it may be loose text that describes the item. The problem is: How do you describe both at the same time?

The mixed content solution

One solution is to use a mixed content element. The mixed content element is a simple concept that allows you to put both structured and unstructured data into a single element, simultaneously.

The mixed content element is defined in the Document Type Definition (DTD). Listing A shows a sample DTD that illustrates how to define a mixed content element.

Listing A: mixedcontent.dtd
<!ELEMENT Order (Item+)>
<!ELEMENT Item (#PCDATA | Description)*>
<!ELEMENT Description (#PCDATA)>

This code has been simplified. It shows that an Order element can contain one or more Item elements. Each Item element can then contain text data, or a child Description element. Because of the nature of mixed content elements, you have less control over what kind of structured data can appear.

The problem and the solution

Let's look at the problem in a little more detail and discuss a potential solution. Suppose we have an order, as described above, that might contain both "raw" content and structured XML content. Listing B shows an example of how an actual XML document might look.

Listing B: order.xml
<Order>
  <Item>
    <SKU>KKU8123</SKU>
    <Name>Super Widget</Name>
    <Description>A super widget device</Description>
    <PricePer>13.50</PricePer>
  </Item>
  <Item>
    8234556:Hyper Flange, $34.95
  </Item>
  <Item>
    Small metallic device for assisting in flotalating.
    <Name>Metallic Flotalator</Name>
    <PricePer>.50</PricePer>
  </Item>
</Order>

Our order contains multiple items, with the item data coming from different systems and in different formats. The good news is that we can use a mixed content element to describe each item without breaking our XML document.

Listing C shows an extension of the DTD above, including the SKU, Name, Description, and PricePer for each item, in addition to any raw content. This DTD will serve fine to describe the XML document in Listing B.

Listing C: mixedcontent2.dtd
<!ELEMENT Order (Item+)>
<!ELEMENT Item (#PCDATA | SKU | Name | Description | PricePer)*>
<!ELEMENT SKU (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Description (#PCDATA)>
<!ELEMENT PricePer (#PCDATA)>

Summary

When working with XML data, you must often determine the best way to organize data that you don't have much control over. One solution to the problem of disparate data formats is to use a mixed content element, which can include both raw data and structured child elements.

Brian Schaffner is a senior consultant for Fujitsu Consulting. He provides architecture, design, and development support for Fujitsu's Telcom360 group.

Editor's Picks