XML is generally thought of as a method for describing data using text. For example, elements are given text names, and element contents are usually text-based. There are times, however, when you’ll want to put data other than text into your XML documents. Let’s examine some of your options.

The problem
You might think that you can just drop some binary data inside a start and end tag and you’re done. Unfortunately, this can lead to several potential problems:

  • ·        New line and space characters in XML will mess up the binary data.
  • ·        Binary data may contain null characters.
  • ·        Binary data may contain </ sequences.

These problems affect both the binary data and the XML parsing. If the parser can’t figure out what’s going on, you won’t be able to extract any data. If the data is “formatted” by the parser, you won’t be able to process the binary data correctly.

The solutions
There are at least three solutions to this problem:

  • ·        Embed the binary data directly in the XML document using the CDATA tag.
  • ·        Refer to the binary data using a URL.
  • ·        Encode the binary data into a text-based format that can be set as the contents of an XML element.

Binary embedding
If you opt to embed the binary data in the XML document, you won’t have to pull the file from a remote source or decode it before using it. The data is available for immediate processing.

To employ this method, use the XML CDATA tag, which is a special tag for processing data that isn’t going to be parsed during XML processing. Essentially, you use a start and end tag to signify where the binary data begins and ends. The value of the element containing the CDATA will be the binary data. Listing A offers an example.

As you can see, the CDATA tag uses the sequence <![CDATA[ as a start tag and the sequence ]]> as an end tag. The XML parser ignores all the data between the tags.

Unfortunately, this method has some problems. First, you may run into issues with the character set used by the XML document, parser, and your binary data. Second, your binary data may contain the ]]> sequence, which would indicate the end of the nonparsed data to the XML parser even though it’s not the end of the binary data—a messy situation.

Binary reference
Probably the easiest solution is to put the binary file on a network-accessible server and just refer to it by URL. Using the reference frees you from worrying about encoding the file or sending large files across the network with the XML. It also allows you to dynamically update the file without having to send a new XML document. Listing B shows an example.

Binary encoding
You can also choose from a handful of methods for encoding binary data as text data. Essentially, the process changes the binary bytes into ASCII bytes using a relatively simple algorithm. The two most popular binary encoding algorithms are UUencode and base64 encoding.

An extended version of binary encoding, called MIME, adds information about the file that’s encoded (such as the filename). Encoding programs are easy to find as shareware and programming tools. Listing C shows some code that embeds a binary-encoded file in an XML document.

Putting other types of data besides text in XML files can be useful in a variety of situations. But it’s not as simple as just dropping data inside a start and end tag, which can lead to problems with both the data and the XML parsing. Fortunately, you can use one of three solutions—binary embedding, binary reference, or binary encoding—to successfully include data other than text in your XML documents.