Save time and prevent errors by converting data to XML format

Converting and manipulating data that's presented in varying formats can be time-consuming, inefficient, and prone to errors. One solution is to convert the data into an XML document. This hands-on example shows how you can accomplish this.

You often need to deal with data stored or transferred in various formats, ranging from comma- or tab-separated files to more complicated formats, each needing an appropriate parser. This can slow down development and create a source of errors. In addition, there is no guarantee that your parser will convert data into a format convenient for future processing or integration, especially with third-party software. One solution to this problem is to convert data of certain frequently used formats to an XML document that can be then saved, processed, or transformed to other formats.

A hands-on example
Many formats are available for storing, exporting, importing, and transferring data within or between software applications. Most common are delimited formats, such as comma- or tab-separated data, and the fixed-length format. Let's assume that you have an address book application, which allows exporting data entries in the comma-separated and the fixed-length formats.

With a comma-delimited format, a comma separates each of the fields in each data record, as shown in Listing A. With a fixed-length data format, a standard width is expected for each field in the record. Listing B shows a file of address book records in the fixed-length format.

Creating the XML document
Now, let's start parsing and converting input data into an XML document. The XML document, org.w3c.dom.Document, is the primary data type for the entire Document Object Model (DOM) and provides access to the document's data.

You can create a document from your data by executing the buildDocument(InputStream is) method, shown in Listing C. The method reads the data stream line-by-line and parses each line according to the format specified.

To parse delimited data, you need to create an instance of the class with the constructor PlainTextToXmlFormatter(String[ ] colName,String delim), where a delimiter can be any string. For the fixed-length format, you will use the second constructor, which takes the array of data records lengths as a parameter PlainTextToXmlFormatter(String[ ] colName,int[ ] colLen). The lengths for our records are 10, 10, 30, and 10 characters, respectively. The colName stands for the array of names of data records you want to parse. In this example, the names are firstName, lastName, email, and phone.

The actual parsing of a data line into data tokens is done by executing the getStringArray(String read,String delim) or parseFixedLengthMessage(String read,int[ ] colLen) methods. They return an array of Strings by splitting a given input into the number of tokens corresponding to the number of column names. If the data format is incorrect, an exception is thrown and the parsing stops. It is possible to ignore exceptions to complete the data parsing by calling the setSkipError(true) method. It prevents throwing exceptions but allows printing error messages to the error output stream.

When the line is parsed into tokens, they are added to an XML document as its elements. Each line of records is placed within the element with the default name line or specified with the method setDataLineName(). Each single data record is a column element with a name supplied in the corresponding class constructor, and it is a child element added to the line element.

After the input data is fully read, you get a valid XML document, which you can further process. Now the data is easy to manipulate because it has a well-known, tree-like structure. You can, for example, pass this document to a third party, who can easily work with it if they know the document's Document Type Definition (DTD). You can also store the document to a file by calling the writeDocument(Document doc,OutputStream osOut) method. Listing D shows an example of an XML document stored into a file.

Viewing data with XSLT transformation
You can also transform your XML data into other formats and represent the content in different views. The easiest way to do this is to apply an XSLT transformation, which provides a powerful implementation of a tree-oriented transformation language for transmuting instances of XML using one vocabulary into simple text, HTML, or XML instances using any other vocabulary.

You use the XSLT language to create desired output from a given XML input. For instance, you can convert XML data into an HTML document by executing transformData(InputStream xmlIn,InputStream xslIn,OutputStream transfOut). Listing E shows an example of an XSLT transformation, and Listing F shows the HTML view of the address book entries.

Simplify data manipulation
We have examined the PlainTextToXmlFormatter class and seen how to convert data from frequently used formats to an XML document. We've also seen how to present the XML document in different views using an XSLT transformation. These techniques provide a good solution to the problems you may run into when working with data of varying formats, saving you time and reducing the likelihood of errors.


Editor's Picks

Free Newsletters, In your Inbox