It’s hard to argue against the convenience, robustness and
flexibility of XML data transport in extended application systems. The people
behind BizTalk Server 2004 have made it BizTalk’s lifeblood and it
serves well. It’s possible to do some pretty intricate, even convoluted, detail
processing using XML schemas as data repositories.

Transitory

One such use is transitory table look-ups. It is possible to
bundle table-value-bearing XML documents into solutions that do high-volume processing,
and thus eliminate repetitive database calls.

However, if you’re doing this kind of data-crunching in
BizTalk, you can create a problem for yourself in the course of trying to solve
another. You may be eliminating calls to a database look-up table, but piling
overhead into an orchestration running on BizTalk, which is already creating
high-overhead.

For example, in electronic
data interchange
(EDI), incoming messages are delivered in a
highly-specific industry-sanctioned format, and this format almost always
requires meticulous decoding. Table look-ups, as you might guess, are numerous.

BizTalk was born for this sort of thing and in terms of
elegance, it can’t be matched: all these look-ups can be done from local XML
documents, rather than database tables. In most forms of EDI messaging, BizTalk
solutions will contain an xsd defining the format’s valid segments (record types),
data values (data types for each seg’s elements), and
table values (the values that are considered legal for data values so defined),
and a control structure schema that defines how an XML document following the
message construction rules of the EDI format must be implemented.

When an EDI message of that format is decoded (or encoded,
for that matter), every element falls through this tree of processing for
validation, interpretation and structured reformatting. As you might guess,
that’s a ferocious amount of processing for even a single-byte value, and a
typical message contains hundreds of such values—with a typical day’s
processing including potentially thousands of such messages.

How can you reduce this huge overhead burden? Map the huge
schemas down into small ones, and collapse multiple schemas down to a single
schema where possible.


BizTalk Mapper Utility

These techniques, used in the context of a BizTalk
orchestration, utilize the BizTalk Mapper utility. A download
explaining how to use this utility is available from the TechRepublic Download
Center.


Create shortcuts to XML data elements

The nature of data passing through BizTalk orchestrations is
typically many instances of records or documents from disparate sources. For
this reason, schemas instanced as XML documents to catch and transport such
documents are typically huge, and able to interpret and store the smallest
anticipated sub-element among myriad record types in a document with multiple
loops.

Extracting data from such a document, especially when your
orchestration is processing a huge volume of them, is in some ways a wasteful
process: XML documents are convenient because they contain internal structures
to accommodate many different configurations of a particular aggregate data
structure or document, but are wasteful in that the vast majority of these
structures go unused, and must be searched through in any operation that
extracts data from those structures that do get used.

In short, in a BizTalk processing environment, XML documents
will tend to be many, and focused on instancing real-world documents of a
pretty specific nature. How can this work to your advantage?

Re-map the XML document down to a leaner, meaner schema if
the resulting document is going to undergo heavy processing by multiple
processes. Create a schema that re-defines the document, using only the
segments and elements that you know other systems/applications are going to
send you in the real world. If there are 800 potential elements in, for
instance, an inbound purchase order for your particular industry, create a
reduced schema that uses only the 100 elements you know your customers are
going to use. The resulting remapping chews up some cycles, but you’ll recover
far more cycles in the reduced processing load on the new XML document.


Additional white paper resources

“Overview of Native XML Web Services for Microsoft SQL
Server 2005”

“BizTalk Server 2004 – Architecture”

“BizTalk Server 2004 Business Rules Framework”


Use one schema where three will do

Trimming down your XML documents with a reduced schema can
be a useful first step, but where processing overhead can really pile up—especially
in the world of standards compliance—is in table look-ups that confirm the validity
of specific data elements.

Here’s another area where the flexibility of XML schemas
serves you well, but also costs you a lot in processing. In the EDI example
above, there’s a data validation hierarchy in place, with a data-bearing xsd in place for
each level: { segments {data values {table values} } }
. To validate any specific table value, the XML document is parsed by segment,
and within each segment, every element is checked for data value validity
(cross-referenced with the elements in the segments), and if a data value is
defined as a table type, the element must be compared with the entries in the
table values XML document for that type.

You can see how this piles up, and in standards compliance, it
is worse than you might imagine. For instance, I recently worked on a Health
Level 7 (HL7) application
that processed inbound patient admission, discharge, and transfer (ADT)
documents. Within an IN1 (insurance information) segment alone, among the
elements that included only the Insurance Company ID, name, address, phone
number and contact, there were nineteen table look-ups. That’s just for one
small section of an individual inbound document.

What can you do? Realizing that xsd schemas written for XML data
validation are usually exhaustive, you can practice another more direct form of a
schema reduction with these two efficient steps:

  • Throw
    out all the segments, data values and table values that are never used. Strip
    the XML documents of all unnecessary information. This is a powerful step,
    since these schemas will generally be used in processing every inbound XML
    document.
  • Condense
    your validation schemas into a single schema. Once you’ve eliminated
    validation information that isn’t necessary in your particular
    application, go the extra mile: create a single schema to do the work of
    the schemas it replaces.

In the example above, you would be nesting the table values
schema within the data values schema within the segments schema, by
restructuring them as nodes. That is, table values become child elements of a
data value node (when the data value node defines a table type), which in turn
belongs to a parent node that specifies the segment. Will there be redundancy? Yes,
quite a bit. Will it be a lot of work? You bet. But your application performance
will go to warp speed.