Generating XML via Java

Most attention in the XML world focuses on parsing XML. The generation of XML documents is an often overlooked but necessary development task. Discover various approaches to XML generation in this article.

Most of the attention in the XML world focuses on parsing XML and walking an XML structure. The W3C provides the DOM and SAX specifications to parse data, Sun provides the Java XML Pack, and Apache has Xerces and Xalan. However, very little attention is paid to the techniques for XML output. Projects are looking into turning JavaBeans and Swing components into XML, but most of the time, developers simply want to output a data structure in a custom-formatted way.

This article concentrates on methods for creating XML documents via Java. I offer a few different methods to create XML documents in Java. They have differing advantages—some are ridiculously simple, and others rely on heavy, powerful libraries. I’ll begin with the simplest.

Using the StringBuffer class
The simplest, and most commonly used, method for creating XML documents is to do it yourself. You can do so by using the StringBuffer class or some form of the Writer class. The advantages are that you don’t need any additional libraries, and you create no extra objects. However, this approach has many disadvantages. There is no validation to ensure properly formed XML. Characters must be escaped when placed in String objects. And you can’t escape XML entities, such as replacing < with &lt;. Listing A provides an example.

The output of Listing A is:
<person name="Jon Smith" age="21"/>

The simple example from Listing A falls flat on its face when presented with an odd name, such as Jon "The Cat" Smith. The code will not escape the quote (") characters when retrieving the name, and the output will be erroneous:
<person name="Jon "The Cat" Smith" age="21"/>

It is difficult to keep track of the XML hidden inside the Java when trying to read the source code. Indeed, half of the errors of this development approach will come down to unclosed tags and bad handling of quotes. In short, the result will be invalid XML.

A cleaner and easier way: DOM
The next method is the Document Object Model (DOM) way. Given an object structure, you convert it to some form of XML-object structure and then traverse that structure and output it. Many types of structures are available, ranging from the Jakarta Element Construction Kit (ECS) project's XML class to a full DOM with a DOM-compliant parser such as Xerces. The smaller versions often come with very simple methods to output the XML. Listing B shows an example in ECS.

Listing B offers a nice, concise way to output the data. In fact, you could merge the two output lines into one by appending the output method onto the new XMLDocument. This is a classic syntax pattern with ECS and is easy to work with. However, it does not nicely replicate the if-null protection for the person's age. To achieve this protection, you have to break the code as shown in Listing C.

ECS presents several advantages. You don’t have to escape quote characters ("). You don’t need to close the tags at all—the objects take care of that for you, and any XML characters, such as < or >, are escaped to &lt; and &gt;. Also, ECS is the simplest of the DOM-style methodologies. Handling this style of code in W3C's DOM, JDOM, or Dom4J is a layer of complexity higher, although the advantage of W3C's DOM is that of parser independence.

Among the disadvantages of outputting XML using ECS is the object structure. You must build it before writing things out. While doing so might be fine in most cases, you wouldn't want to be assembling this XML structure when outputting a large XML file. The same disadvantage holds for most other DOM-style methodologies.

ECS is much closer to the mark than the previous methodology of using simple Writers or StringBuffer classes. It has a big jar size, but only a small part is necessary to output XML. The biggest failing is that it doesn't scope well. It beats out any of its brethren because they are all larger, heavier, and more complicated.

Great SAX
There is an alternative to the DOM-style of XML parsing: the Simple API for XML (SAX). It consists of a series of events or callbacks that are called on your code while the XML file is parsed. It is not much use when you want to output directly to Strings, but it can be used in an indirect, more complex way.

Code that outputs Strings can instead output SAX events. This is more powerful than just outputting Strings, and it can be added to a simple generic class that turns SAX events into XML.

Let’s look at an example that uses the following classes:
  • ·        Person: A business object, described previously
  • ·        PersonInputSource: Holds a Person object
  • ·        PersonXMLReader: Knows how to turn a PersonInputSource into SAX events
  • ·        XMLPrettyPrinter: A ContentHandler that turns SAX events into XML

The most important piece of code is in PersonXMLReader, as shown in Listing D.

The code in Listing D is the guts of how the Person object is turned into a series of SAX events. It is not the simplest thing to do. Transforming a Person to XML via SAX functionality is implemented by the top-level code in Listing E.

Using SAX definitely gives you added power, because you can attach a SAX parser to the XML instead of the XMLPrettyPrinter. However, complexity increases when you add the handler; SAX is a more complicated concept. In many cases, it is true that the simple approach is best.

Once the generic components are written (XMLPrettyPrinter, a generic InputSource, and an XMLParser object), the event firing is simplified. The output of a new XML structure requires only the parse method and the top-level plugging together of components.

Having an XML-output system around SAX events has a lot going for it, but it is not a quick, off-the-shelf approach.

Using an XmlWriter class
Finally, I present my own alternative, an XmlWriter class. The idea is to output XML using a technique that fills the niche between too simple and too complex.

The important design requirements are as follows:
  • ·        Wrap a
  • ·        Provide a Writer-like API
  • ·        Take care of as much of the XML handling as possible
  • ·        Avoid a large object structure
  • ·        Allow the ECS chaining style

These requirements allow the XML to be written in two distinct styles. First, it can be written in the style of a java.lang.Writer code snippet, as shown in Listing F.

Second, it can be written in the chained-method style of coding, just like ECS, because each write method returns the XmlWriter itself. Listing G gives an example.

In terms of performance, XmlWriter is lean and creates few other objects. It is functional, and it can handle basic XML snippets (but not comments, indentation, or doctypes). Most important, it is easy to use.

On the downside, as already mentioned, it doesn't handle comments, indentation, or doctypes. Unlike ECS, which closes tags when the XML object writes itself, XmlWriter requires you to call an endEntity method. This method will throw an XmlWritingException if it is called when there are no entities to end. Finally, a close method exists. It does not close the underlying writer object, but finishes any XML that is being written. Perhaps most important, it will throw an XmlWritingException if there are unended entities.

Many options are available for generating XML documents. XmlWriter is by no means the best tool for every XML creation task, but it can fill the gap that exists between approaches that are too simple, too heavy, and too complex.

Editor's Picks