Recently I
needed to create a Web service that would allow an outside organization to
enter orders on a legacy system using XML. My initial optimism about this project was squashed by
the reality of dealing with the limitations of a system that was designed and
coded sometime during the Carter Administration. While I had anticipated that
the XML would need to be massaged in order to produce something that my
mainframe counterpart could use, I hadn’t foreseen exactly what constituted an
order on both the outside organization and the legacy application.

Muenchian grouping

Most of the
major philosophical differences in what constitutes an order stemmed from the
fact that the outside organization stored customer information on the line
level while the legacy application stored customer information on the order
level. This means that every line on the inbound order could conceivably be a
different customer, a concept that was not contemplated during the Carter years.
Another issue which only served to compound this problem is that the legacy
system only supported a single ship to address, a single bill to address and a
single carrier per order. What had started out as a relatively simple Web
service quickly began to snow ball into both a major project and a major headache.

After a
brief delusional moment where some kind of convoluted C# program was considered
for splitting an inbound order into something the legacy system could handle, I
decided to try a more elegant approach, XSLT and Muenchian grouping.


Downloadable version

A
downloadable version of this article is available in PDF form from the
TechRepublic Download Center. The PDF version includes the article and all of
the code listings in a printer-friendly format.


Developed
by, and named after, Steve
Muench
of Oracle, the Muenchian
grouping method uses the xsl:key element and the key() and generate-id()
functions to determine keys that are used to retrieve nodes. To illustrate how
this works let’s start with the XML document shown in Listing A
and, for the sake of simplicity, group upon only the customer node. With Muenchian grouping we would create XSLT that looks
something like what is shown in
Listing B.

Examine the components

At first
glance Muenchian grouping doesn’t appear to make much
sense, how can an xsl:key element along with a couple of functions
group the nodes in an XML document? In order for the inner workings to become
clear it is necessary to examine the individual components, only then will the
simple elegance become clear.

The xsl:key
element is a top-level XSLT element that is used to declare a named key. The
element’s match attribute defines the node returned from the key() function while the use
attribute defines the key itself. The purpose of the predicate, [1], is to
insure that only the first instance of each key is returned. This means that
each key is used only once instead of the number of times that the key occurs. Without
this predicate, if a particular key occurred twice the associated elements
would be doubled, three times tripled, and so forth.

The generate-id()
function is the final piece, it returns a string that uniquely identifies a
node, regardless of how the node is obtained. So, if generate-id(.) is equal to generate-id(key(‘keyGroup’,customer)[1])
is true then both nodes are the same node, just obtained differently. It is
also important to remember that while the string is always unique, there is no
guarantee that id will be equal from
execution to execution.

Multiple criteria

Occasionally,
when grouping, it is necessary to group information on more than one criteria. Consider the XML document from Listing A. Wouldn’t
it make sense to, in addition to grouping on customer,
group on carrier, ship to address, and bill to address? Seems easy enough, just
add another xsl:key element and another key() function, right? Wrong!

When using Muenchian grouping only one xsl:key
element and key() function is ever
needed. Multiple grouping criteria is handled through the addition of the concat() function which is used to concatenate
the various nodes and/or elements to produce a single key. With this in mind, Listing C shows the XSLT that performs grouping on
the customer, carrier, ship to address and bill to address.

How far can
Muenchian grouping using concatenated be taken?
Personally I’ve used this technique to group information based upon eleven
different criteria. The resulting XSLT was, when printed not quite three pages
long, which while impressive for XSLT, was thousands of lines shorter than
programs written in traditional programming languages.

An adaptable solution

While this
may, at first, seem like a rather specific application it is one of those tools
that can be readily adapted to other uses. I recommend playing with the
examples provided here in order to become comfortable with this technique. In
addition, you may want to purchase a copy of XMLSPY from Altova, which was
used to produce and test the examples provided here. Not only is it a slick
professional XML editor, it allows the developer to step-through an XSLT
element by element, which comes in handy when debugging Muenchian
grouping.