Developer

Database consultants can't wait FOR XML

Benefits of using XML-enabled database applications


XML has penetrated every aspect of business applications, from data interchange to presentation logic using tools like XSLT. It seems that if an application doesn't use XML, something is missing. But none of this is enough to unseat the reigning champion of storage: the database.

Luckily, Microsoft has offered a solution that bridges the chasm between the database and XML in the form of SQL-XML support, available in SQL Server 2000 and since extended in several update packs. In the past, you'd have to use a custom library to convert your data to XML form, but now you can get away (in most cases) with using built-in, server-optimized XML generation.

I was recently asked to help architect and develop press room management software for a new company. The resulting solution, MyPressroom.com, is intended to give small businesses a professional press room presence using Web-based administration. One of the features of the tool lets customers manage their media contact list as well as press clippings. I knew that I needed to design a database to store all information, but I wanted to see if I could use XML to facilitate the overall application design.

I ended up using SQL-XML and the FOR XML clause to create XML document versions of my data, which are then transformed into a presentation using XSLT. Since my application uses XML as its main document interchange format, I could easily extend it to use a simple contact manager. Sound interesting? Let's see what FOR XML can do for your applications.

Housekeeping before we start
SQL Server 2000 shipped with built-in support for XML, including the FOR XML clause. However, Microsoft knew that it would want to expand the support over time. To address this, it has continued to release SQL-XML update packs that fix issues and offer new functionality. In fact, some of the issues mentioned in this article are fixed by the most recent SQL-XML release (3.0 at the time of this writing.)

I've chosen to write this article using only the built-in support in SQL Server 2000. That enables the developer to avoid worrying about exact installation logistics at the remote site. However, if possible, I highly recommend working with the latest SQL-XML release and installing it on the target deployment server.

Getting started
FOR XML is a simple extension to the built-in T-SQL query language available in SQL Server 2000. That means that you have to be familiar with T-SQL syntax before using this tool. Since I can't cover both T-SQL and FOR XML in one article, I'll assume that familiarity.

A typical T-SQL query might look like this:
SELECT * FROM tblPR_Contacts ORDER BY LastName ASC

This query will generally return the information in a rowset. To convert that to XML, you need only append the keywords FOR XML AUTO, resulting in the following T-SQL statement:
SELECT * FROM tblPR_Contacts ORDER BY LastName ASC FOR XML AUTO

If you create a Contacts table with some simple fields (LastName, FirstName, Phone), you can run the above query and get the following output:
<tblPR_Contacts LastName="Jones" FirstName="Kelly" Phone="555-555-5555"/>
<tblPR_Contacts LastName="Smith" FirstName="Tom" Phone="555-555-5555"/>


Notice a few things about this output. First, the resulting XML text is a fragment and doesn't include the header and root elements required for a complete XML document. If you need a complete XML document, you'll have to create it either by wrapping the text with the right headers or using the XMLDocument object within the .NET framework. Second, the resulting XML fragment uses the table name to define the XML tag for each entry and the column names to define each attribute name. In many cases, your table and column names won't match your desired output format. In the above example, you probably want the tag name to be Contact, not tblPR_Contacts. You can achieve this result by aliasing entries in the table and column names, like so:
SELECT LastName, FirstName, Phone [BusinessPhone]
FROM tblPR_Contacts [Contact]
ORDER BY LastName ASC
FOR XML AUTO


which results in:
<Contact LastName="Jones" FirstName="Kelly" BusinessPhone="555-555-5555"/>
...


Constructing a complete XML document
As I mentioned, I decided to use FOR XML to create XML documents that I could then transform with XSLT into a presentation. The .NET XML object requires you to provide it with a correctly structured XML Document. To achieve this, you'll have to extract the XML fragment from FOR XML and wrap it with text to create the complete document. If that wasn't enough, you must address another complication. The SQLDataReader object doesn't guarantee the number of Read() calls (one or more) it will take to get at the complete XML fragment text. So you have to reconstruct the fragment piece-wise. It's not that hard, so let's look at some code:
// Create and open the connection
SqlConnection sc = new SqlConnection(QueryDataSource);
sc.Open();

// Create and execute the command and reader
SqlCommand scmd = new SqlCommand(QueryDataString, sc);
SqlDataReader dr = scmd.ExecuteReader();

// Use the StringBuilder to get the fragment bits.
// This is more efficient than appending to a string using str += "..."
StringBuilder xml_text = new StringBuilder();
xml_text.Append("<?xml version=\"1.0\" encoding=\"utf-8\" ?><document>");
while (dr.Read())
  xml_text.Append(dr.GetString(0));
xml_text.Append("</document>");

// Clean up the objects
dr.Close();
sc.Close();



The comments should make most of the code self-explanatory. Once you have the complete XML document text, you can use it in your applications or pump it into an XML object using the DocumentContent property.

Doing more with FOR XML - Nested XML tags
In some cases, you may want to create an XML document that nests multiple objects (or data rows) using tag notation. In this example, you might want to output XML that shows all of the contacts grouped by company. Again, the FOR XML support makes this easy. FOR XML assumes that any rows joined in the query are to be displayed in a master-child nested arrangement. Let's look at an example:
SELECT Company.FullName, Contact.*
FROM tblPR_Companies [Company]
INNER JOIN tblPR_Contacts [Contact]
ON Company.[ID]=Contacts.[idCompany]
FOR XML AUTO


The resulting XML fragment looks like this:
<Company FullName="Company One, Inc.">
  <Contact LastName="Jones" .../>
  <Contact LastName="Smith" .../>
</Company>
<Company FullName="Company Two, LLC.">
  <Contact ...>
</Company>


You can see how quickly you can convert a well-architected, normalized database into a structured XML document.

Attributes or elements
So far, you've seen how to retrieve column values as attributes in the XML fragment. But what if you want those columns to be returned as elements? A simple solution is to use the ELEMENTS modifier in the form of ... FOR XML AUTO, ELEMENTS. This modifier tells FOR XML to return every single value inside an element, as in:
<Contact>
  <LastName>Jones</LastName>
  ...
</Contact>


This may or may not be acceptable. To fully address this issue, you must use the EXPLICIT mode of FOR XML, which allows you to specify every aspect of the resulting XML fragment.

Explicit definition of the XML output
As you've seen, the AUTO mode of the FOR XML clause is quite powerful and will often give you the results you're looking for. However, in some cases, you may find that you want to define the exact format of the XML fragment generated. The EXPLICIT mode gives you just such control, although at the expense of complexity. I'll offer a brief introduction to the EXPLICIT mode and direct you to a detailed review provided by Microsoft in "Using EXPLICIT Mode."

The output of the EXPLICIT mode is defined by a universal table. The column names and values within the universal table instruct the processor as to the form of the results. This mode requires two metadata columns to be included in the universal table. These two columns define the parent-child hierarchy of the XML document:
  • The first column must be titled Tag and be of integer type. The values of this column in each row define a unique identifier for every tag that will be output with the results.
  • The second column must be titled Parent and also be of integer type. A value of NULL instructs the processor that the tag is a top-level tag. Any other value generates a nested tag under the appropriately identified parent entry.

The rest of the columns in the universal table are named according to the following convention: ElementName!TagNumber!AttributeName!Directive. An example of a column name might be Company!1!FullName, which creates an element called Company for tag number 1 and defines the value of this column to be output as an attribute titled FullName. If you specified that tag number 2 has a parent tag number 1, then another column might be Contact!2!BusinessPhone, which defines the BusinessPhone attribute for your contact, which will be nested within the Company.

The Directive component of the column name is used to instruct the processor to do certain things with the output. For example, you can designate a particular column with the "element" directive and force the output to generate a full XML element vs. an attribute. Other directives are available for more sophisticated output forms.

The rows within the universal table must be sequentially ordered in a depth-first fashion, where each row represents the information in one level of the hierarchy. Nested tags are automatically opened and closed according to the left-most column of each object set. This may sound complicated; Table A provides an example.
 
Table A
Tag Parent Company!1!
ID
Company!1!
FullName
Contact!2!
LastName
Contact!2!
First Name
1 Null 101 Company One, Inc Null Null
2 1 101 Null Smith Kelly
2 1 101 Null Jones Tom
1 Null 102 Company Two, LLC. Null Null
2 1 102 Null Simon Paul

Since the universal table needs to have a series of numbers for tags and parents (generally in single increments), you may find that using the UNION clause of T-SQL is your best answer. Also, it's the responsibility of the developer to correctly define the universal table to generate valid XML. I recommend using columnar view first to ensure the validity of the data before applying the FOR XML EXPLICIT modifier.

Caveats
The FOR XML support, while impressive, has a few issues that are worth noting before extensive use. For example, FOR XML is not valid in any subselections or subqueries, so you cannot use it to generate an XML fragment column in a larger query. FOR XML is also not valid within a view, so you cannot create a view that returns XML directly. There are several other caveats that you should review or you might find yourself frustrated at the outcome. Microsoft has put together a complete list in "Guidelines for Using the FOR XML Clause." I recommend reading it at least once to familiarize yourself with the issues.

Editor's Picks

Free Newsletters, In your Inbox