Enterprise Software

Use reduced XML schemas to trim BizTalk processing overhead

XML processing is convenient but not always efficient. If XML processing is at the core of your BizTalk application, consider schema reduction for high throughput operations.

It's hard to argue against the convenience, robustness and flexibility of XML data transport in extended application systems. The people behind BizTalk Server 2004 have made it BizTalk's lifeblood and it serves well. It's possible to do some pretty intricate, even convoluted, detail processing using XML schemas as data repositories.

Transitory

One such use is transitory table look-ups. It is possible to bundle table-value-bearing XML documents into solutions that do high-volume processing, and thus eliminate repetitive database calls.

However, if you're doing this kind of data-crunching in BizTalk, you can create a problem for yourself in the course of trying to solve another. You may be eliminating calls to a database look-up table, but piling overhead into an orchestration running on BizTalk, which is already creating high-overhead.

For example, in electronic data interchange (EDI), incoming messages are delivered in a highly-specific industry-sanctioned format, and this format almost always requires meticulous decoding. Table look-ups, as you might guess, are numerous.

BizTalk was born for this sort of thing and in terms of elegance, it can't be matched: all these look-ups can be done from local XML documents, rather than database tables. In most forms of EDI messaging, BizTalk solutions will contain an xsd defining the format's valid segments (record types), data values (data types for each seg's elements), and table values (the values that are considered legal for data values so defined), and a control structure schema that defines how an XML document following the message construction rules of the EDI format must be implemented.

When an EDI message of that format is decoded (or encoded, for that matter), every element falls through this tree of processing for validation, interpretation and structured reformatting. As you might guess, that's a ferocious amount of processing for even a single-byte value, and a typical message contains hundreds of such values—with a typical day's processing including potentially thousands of such messages.

How can you reduce this huge overhead burden? Map the huge schemas down into small ones, and collapse multiple schemas down to a single schema where possible.


BizTalk Mapper Utility

These techniques, used in the context of a BizTalk orchestration, utilize the BizTalk Mapper utility. A download explaining how to use this utility is available from the TechRepublic Download Center.


Create shortcuts to XML data elements

The nature of data passing through BizTalk orchestrations is typically many instances of records or documents from disparate sources. For this reason, schemas instanced as XML documents to catch and transport such documents are typically huge, and able to interpret and store the smallest anticipated sub-element among myriad record types in a document with multiple loops.

Extracting data from such a document, especially when your orchestration is processing a huge volume of them, is in some ways a wasteful process: XML documents are convenient because they contain internal structures to accommodate many different configurations of a particular aggregate data structure or document, but are wasteful in that the vast majority of these structures go unused, and must be searched through in any operation that extracts data from those structures that do get used.

In short, in a BizTalk processing environment, XML documents will tend to be many, and focused on instancing real-world documents of a pretty specific nature. How can this work to your advantage?

Re-map the XML document down to a leaner, meaner schema if the resulting document is going to undergo heavy processing by multiple processes. Create a schema that re-defines the document, using only the segments and elements that you know other systems/applications are going to send you in the real world. If there are 800 potential elements in, for instance, an inbound purchase order for your particular industry, create a reduced schema that uses only the 100 elements you know your customers are going to use. The resulting remapping chews up some cycles, but you'll recover far more cycles in the reduced processing load on the new XML document.


Additional white paper resources

"Overview of Native XML Web Services for Microsoft SQL Server 2005"
"BizTalk Server 2004 – Architecture"
"BizTalk Server 2004 Business Rules Framework"


Use one schema where three will do

Trimming down your XML documents with a reduced schema can be a useful first step, but where processing overhead can really pile up—especially in the world of standards compliance—is in table look-ups that confirm the validity of specific data elements.

Here's another area where the flexibility of XML schemas serves you well, but also costs you a lot in processing. In the EDI example above, there's a data validation hierarchy in place, with a data-bearing xsd in place for each level: { segments {data values {table values} } } . To validate any specific table value, the XML document is parsed by segment, and within each segment, every element is checked for data value validity (cross-referenced with the elements in the segments), and if a data value is defined as a table type, the element must be compared with the entries in the table values XML document for that type.

You can see how this piles up, and in standards compliance, it is worse than you might imagine. For instance, I recently worked on a Health Level 7 (HL7) application that processed inbound patient admission, discharge, and transfer (ADT) documents. Within an IN1 (insurance information) segment alone, among the elements that included only the Insurance Company ID, name, address, phone number and contact, there were nineteen table look-ups. That's just for one small section of an individual inbound document.

What can you do? Realizing that xsd schemas written for XML data validation are usually exhaustive, you can practice another more direct form of a schema reduction with these two efficient steps:

  • Throw out all the segments, data values and table values that are never used. Strip the XML documents of all unnecessary information. This is a powerful step, since these schemas will generally be used in processing every inbound XML document.
  • Condense your validation schemas into a single schema. Once you've eliminated validation information that isn't necessary in your particular application, go the extra mile: create a single schema to do the work of the schemas it replaces.

In the example above, you would be nesting the table values schema within the data values schema within the segments schema, by restructuring them as nodes. That is, table values become child elements of a data value node (when the data value node defines a table type), which in turn belongs to a parent node that specifies the segment. Will there be redundancy? Yes, quite a bit. Will it be a lot of work? You bet. But your application performance will go to warp speed.

About Scott Robinson

Scott Robinson is a 20-year IT veteran with extensive experience in business intelligence and systems integration. An enterprise architect with a background in social psychology, he frequently consults and lectures on analytics, business intelligence...

Editor's Picks

Free Newsletters, In your Inbox