By Jason Monberg and Mike Wynholds
XSL, or Extensible Stylesheet Language, consists of three languages for converting XML documents into other formats. XSL Formatting Objects (XML FO) describes visual formatting, while XPath accesses specific parts of an XML document. But XSL Transformations (XSLT) is the language for actually converting documents from one XML format into another.
The simplest case starts with two documents: an XML document of original data and an XSLT style sheet for transforming it. An XSLT processor applies the rules from the XSLT style sheet to the XML document to create a new, third document in XHTML, WML, SVG, or almost any other XML format.
Multiple XSLT style sheets can present a single document in multiple formats. A single style sheet could transform multiple instances of one data type into a standard presentation format, which you could change simply by modifying the style sheet. Or XSLT could transform multiple instances of data into multiple formats. And it is not constrained to presentation: XSLT is a powerful tool for translating one system's data format into another's, as in B2B transactions.
Start learning XSLT
Learning to use XSLT effectively takes time. Some aspects are quite intuitive, while others may seem a bit extraneous. Once you become familiar with XSLT and XPath, however, you can get the hang of XSL in a production environment fairly quickly.
To get started, you'll need an XSLT processor. As with any type of technology development, the tools you use can make or break the project. There are only a handful of desktop XSLT prototyping tools available, as the majority of such tools are for full-scale production systems. And you must take into account how well your tools support the XSLT specification.
Recent browsers such as Internet Explorer 5.5, Netscape 6.1, and Mozilla support XSLT processing. They are probably the easiest tools to use but are currently unreliable in their specification support. Also, a browser does not provide the support of a true development tool and won't help much when debugging code. XSLT conversions are usually done on the server, so browsers will work only for XML files that contain a link to a style sheet.
Instant Saxon is a simple, command-line server-style XSLT processor for Windows. It provides basic file output and error information. It delivers more solid XSLT support than is available from browsers. Though not a full-blown development environment, Instant Saxon is a great tool to experiment with.
XML Spy is a complete XML IDE that can be downloaded for evaluation. It can use Instant Saxon as its XSLT processor. This is a good tool for anyone developing with XML in a production environment, though it takes time to master.
If none of these tools work for you or you wish to set up a full production environment, we list a selection of server-based XSLT processors at the end of this tutorial.
A testing example
Our examples assume that the XSL processing tool and the XML and XSL files are all in one directory. In this example, we have an XML document that represents a fast-food lunch order that we need to transform into a readable HTML format.
Copy and paste this XML into a text editor and save it as order.xml. Likewise, copy this XSL into a file named order.xsl. The XML file is linked to the XSL style sheet, so you can view the XML file in a suitable browser or XSL Transform it in XML Spy. With Instant Saxon, open an MS-DOS command prompt, go into the files' directory, and type
saxon.exe order.xml order.xsl > order.html
This will write the transformed HTML output into a file called order.html that you can view in your browser.
The results of the example should look like this: an HTML page with a title showing Mike's order (number 734) and a table of what he ordered, including cost. The XSLT processor took the XML file containing the data and transformed it into the HTML output. The XSLT style sheet defined the HTML tags to place around the XML data using the processing instructions that comprise the XSLT language.
Although an XSLT processor is normally instructed which style sheet to apply, an XML document can indicate its own default XSLT style sheet by including the line
<?xml-stylesheet type="text/xsl" href="my.xsl"?>
where my.xsl is a URL to the style sheet. This code is essential for browser-based transforming.
The XSLT style sheet
To understand XSLT programming, you must understand XML, since XSLT not only transforms XML but is also a fully XML-compliant language itself. You could, in theory, write an XSLT style sheet that transforms itself, an interesting if not very useful capability.
Recall that XML is not a language in the normal sense, but a metalanguage—a structure with which one builds any number of XML-compliant languages (XSL is one and XHTML is another). HTML looks XML compliant but actually violates some XML rules.
XML languages define a set of tags used to mark up data into elements, or nodes. For example, in XHTML, a <table> tag begins a specific XML node. XML nodes can contain attributes and body content. Attributes are name/value pairs made up of strings. Body content can be strings and/or more XML nodes. This means that XML is hierarchical and can represent very complex data formats. Consider an XHTML fragment:
Each node has an opening and closing tag, between which are more nodes and a text string. The img node has an src attribute and, having no content, closes its opening tag with an ending slash. It and the text are nested within <td> nodes within a <tr> node nested within a <table>.
The core ideas of XSLT are establishing a context—which is a particular node or set of nodes—within an XML document and outputting a formatted version of the data that exists within that context. To do this, an XSLT style sheet is separated into discrete templates, each of which handles certain types of tags in the XML document. Within these templates, XSLT utilizes variables, passed parameters, looping constructs, conditionals, and other devices geared toward transforming XML.
The <xsl:stylesheet> element is the outermost element of any XSLT style sheet, assigning it a version and one or more namespaces:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform>
You can set other attributes, but for almost all basic style sheets, you can use these exact <xsl:stylesheet> tags. The template elements are nested within them.
The <xsl:template> element defines a context in which to execute along with the resulting output. It has the syntax
An XSLT processor executes a <xsl:template> when it finds either an explicit call in the style sheet or a matching node in the source XML document. The most common cause is matching nodes encountered as the XSLT processor scans the XML. The match attribute takes an XPath expression, identifying which nodes set off the template.
An activated <xsl:template> element outputs its nested contents. These can consist of text and non-XSLT markup, which go straight to the new document, and more XSLT elements, which execute in the context of the matched node. It is important to keep track of context. An XSLT element processes only the same nodes as the template that activates it.
More than one template may match a node. In this case, fairly complex rules using the mode and priority attributes determine which template will process the node. Most simple style sheets contain only one template to match a given node.
For XML documents that contain mostly marked-up text, such as HTML, your XSLT style sheet will probably contain a template for each tag you might encounter. For XML documents that contain highly structured hierarchical data, your style sheet may contain templates for only the top-level nodes. These templates will know the data structure and access the subnodes directly, rather than leave them to other templates.
For example, this sample XML file contains a short, marked-up book. It consists of one <book> node containing a <title> and multiple <chapter> nodes. This template would execute for each <chapter> within the top-level <book>:
This is chapter <xsl:number/>, entitled "<xsl:value-of select="title"/>"
If an XSLT processor has no matching template for a node or its parent nodes, it simply outputs the node's contents, though these could contain subnodes that set off their own templates. So a style sheet with only the preceding template would produce the following result:
<?xml version="1.0" encoding="utf-8"?>
This is chapter 1, entitled "How it begins"
This is chapter 2, entitled "What transpires"
This is chapter 3, entitled "Where it ends"
The <paragraph> nodes are skipped because their <chapter> parents were processed, but the first <title>, not being in a <chapter>, just gets printed as is.
XPath is a language for referencing specific parts of an XML document, although it supports a richer feature set than simply pointing to data. In XSLT style sheets, XPath expressions return four types of values: node-set, Boolean, number, and string. XSLT elements typically take an XPath expression as an attribute value, using the result of the evaluated expression.
The most common use in basic XSLT is to return a node-set or string, depending on the element. For example, <xsl:template match="chapter"> defines a template for <chapter> nodes within the current node context. In this case, the XPath expression chapter returns a node-set as the new context for further XSL functions. Whereas in <xsl:value-of select="title"/>, the XPath expression title returns the raw content of any <title> nodes within the current context as a string.
For pointing to nodes not in the immediate context, XPath navigation looks and behaves similar to file-system navigation. Slash characters separate parent and child nodes: chapter/title references only <title> nodes directly within the current context's <chapter> nodes. The common file-system syntax for going up a directory level refers to a node's parent: ../title would point to <title> nodes within the context node's parent, such as the book title as seen from a chapter in our book sample.
An important difference from file navigation is that while you won't find two files with the same name in the same location, you can easily have two nodes of the same type, so an XPath location such as chapter/paragraph often refers to not one, but multiple nodes.
Our paths so far have started from the current context, but just as in a file system, a path can be absolute rather than relative. A starting slash points to the root of the document—not the document's first node, but an abstract node representing the document overall and the default starting context in an XSLT template. So /book/title would return only <title> nodes within the top-level <book>.
The double slash (//) is a wildcard path to a node. In our book sample, <xsl:template match="//title"> would return <title> nodes anywhere in the document, whether at /book/title or /book/chapter/title. The double slash can be in the middle of a path, so for our specific sample, /book//title would work just as well.
An asterisk at the end of a path returns all elements found there, again, similar to file-system wildcards. In our sample, /book/chapter/* would reference both <title> and <paragraph> nodes, while the path //* would return all of the nodes in any document.
XPath provides syntax for selecting specific attributes, one of several instances of an element, and nodes based on comparisons.
The @ sign refers to a node's tag attribute. In our book example, some <chapter> nodes have a type attribute, accessible as @type in the node's context. To access it from anywhere, the path would be /book/chapter/@type.
Square brackets select one node out of a set, much like an array in traditional programming. To select just the second <chapter>, you would use an XPath expression such as /book/chapter. Note that the first node in a set is number one, not zero as it is in many programming languages.
You can combine these constructs to select a node by its attribute value: /book/chapter[@type="prologue"] to select just the first <chapter>. This selection functionality has many variations that are out of the scope of this tutorial but are worth exploring.
Beyond navigation and data extraction, XPath offers functions such as counting characters, setting variables, doing basic math, finding the last element, and other types of pattern matching. These functions are for more advanced XSLT, but our test style sheet had a basic math example. The detailed example at the end of this tutorial uses advanced comparisons and variable settings.
At its simplest, XSLT activates templates as it encounters matching nodes while scanning the XML document. But with added XSLT elements, you can control the flow of template execution to suit your needs.
The <xsl:apply-templates> element is used within a template to tell the XSL processor to match the supplied set of nodes to other templates. It has the syntax
When a node triggers a template, XSLT normally assumes that the template will address all of the node's contents and doesn't process them. An <xsl:apply-templates> element within a template tells the XSLT processor to process node contents, executing any corresponding templates en route.
By default, <xsl:apply-templates> processes all immediate child nodes. The select attribute lets you specify only particular descendent nodes to process. It takes an XPath expression relative to the current template's context. The mode attribute causes only templates with a specified mode to execute.
For example, if you had templates for both /book and /book/chapter, you would want an <xsl:apply-templates> in the /book template to activate the /book/chapter one:
This book is entitled "<xsl:value-of select="title"/>"
This is chapter <xsl:number/>, entitled "<xsl:value-of select="title"/>"
View sample XML
You can see how templates pass control to other templates in a chain of command. This also breaks up the style sheet into readable pieces and makes template reuse possible.
The <xsl:call-template> element executes another template by name. It has the syntax
Like <xsl:apply-templates>, <xsl:call-templates> transfers execution temporarily to another template with an identical name attribute. Regardless of its match value, if any, the called template executes in the same context as the calling template. For example:
The name of chapter <xsl:number/> is ">xsl:value-of select="title"/>".
View sample XML
Notice that both <xsl:apply-templates> and <xsl:call-template> can have an ending slash instead of a closing tag. A closing tag is used to nest other XSLT elements that attach special instructions or parameters.
Parameters and variables
Parameters and variables are named values that you can reuse through the course of a template. A variable is defined once, while a parameter is a default value that you can override. Either exists only in the context of the template that defines it. When the template and any templates applied or called en route are finished, the variable or parameter ceases to exist. To use a parameter or a variable across several templates, establish it in a template for a higher-level node.
xsl:param / xsl:with-param
The element defines a parameter within a template, while provides a value for that parameter when executing the template. They have the syntax
Use <xsl:param> within <xsl:template> to define the parameter. The name attribute is its unique label, while select is an XPath expression defining the default value of the parameter. An <xsl:with-param> with a matching name within <xsl:apply-templates> or <xsl:call-template> passes an overriding value to the applied or called templates. For example:
<xsl:param name="use-title" select="string('No Title')"/>
The name of chapter <xsl:number/> in the book "<xsl:value-of select="$use-title"/>" is "<xsl:value-of select="title"/>".
<xsl:with-param name="use-title" select="title"/>
View sample XML
Here, the /book template passes the use-title parameter to the /book/chapter template. The value it passes is the XPath expression title, meaning any <title> nodes within the <book> (our example has just one). This parameter overrides the default value 'No Title'.
The element lets you calculate an expression once and reuse it over and over again, making your code more readable and perhaps more optimized. It has the syntax
The name attribute labels the variable, while select defines its XPath value. XSLT is unlike most programming languages in that once you define a variable, it cannot be changed. This is limiting, but as you get better at writing XSLT style sheets, you will find that, 99 percent of the time, you can achieve your transformation goals without reassigning variables.
<xsl:variable name="var-title" select="title"/>
The title of chapter <xsl:value-of select="$var-num"/> (I repeat: <xsl:value-of select="$var-num"/>) is "<xsl:value-of select="$var-title"/>".
Did I mention the title is "<xsl:value-of select="$var-title"/>"?
View sample XML
For all parameter and variable elements, you can specify the value with content between the opening and closing tags instead of a select attribute, such as the variable var-num above. Notice that in both examples, the element <xsl:value-of> references the variable or value by name with a prepended dollar sign.
Since a template contains content to output to the transformed document, you need a way to output something besides fixed text. These elements output content from the source XML document and text based on template logic.
The <xsl:value-of> element simply outputs the value of an XPath expression. It has the syntax
disable-output-escaping="yes | no" />
The XSLT processor evaluates the select attribute and outputs the result as a string. Node paths produce the nodes' contents while attribute paths, parameters, and variables produce their values. For example:
Book Title = <xsl:value-of select="."/>
Title = <xsl:value-of select="title"/>
Paragraph 1 = <xsl:value-of select="paragraph"/>
Paragraph 2 = <xsl:value-of select="paragraph"/>
View sample XML
The <xsl:number> element outputs a numeric value. It has the syntax
level="single | multiple | all"
The default result is the position of the current node, among its node type, in the source XML; in the second <paragraph> element, <xsl:number> would output 2. The level attribute determines whether that position is within the current node's parent or the entire document; with level="all", the second <paragraph> of the third <chapter> could output 6. The count and from attributes use XPath expressions to designate what nodes should be counted and where counting should start.
The element is also useful for declaring variables as true numbers, as opposed to strings, as in:
<xsl:variable name="var-num"><xsl:number value="4"/></xsl:variable>
There are more <xsl:number> attributes for alternative number systems and grouping. Note that <xsl:value-of> and <xsl:number> are always self-closing, without closing tags or nested content.
Looping and sorting
An XPath expression can point to several nodes of a given type at once. If you don't have a matching template for such nodes, you need a way to process the complete set from another template. And if you do have a matching template, you may want to process the nodes in a different order from the source XML.
The <xsl:for-each> element loops through a node set and processes each node with the element's nested contents. It has the syntax
This element belongs within a template. The select attribute is an XPath expression pointing from the template's context to a node set. The element's nested contents process those nodes in their own context. For example:
This is chapter <xsl:number/>, entitled <xsl:value-of select="title"/>
View sample XML
Inside the loop, the context becomes the current <chapter> node, and the template prints the chapter <title> rather than the book <title>.
The <xsl:sort> element sorts a node set before a <xsl:apply-templates> or <xsl:for-each> element processes it. It has the syntax
order="ascending | descending"
data-type="text | number"
case-order="upper-first | lower-first"
The select attribute specifies the node on which to sort, as you may want to sort nodes based on their attributes or subnodes. The order attribute specifies ascending or descending sort order, data-type establishes whether to sort on a numeric or an alphanumeric calculation, and case-order stipulates how to compare uppercase and lowercase letters. The lang attribute tells the XSL processor to sort by language-specific algorithms.
<xsl:sort select="title" order="ascending"/>
The title of chapter <xsl:number/> is "<xsl:value-of select="title"/>".
View sample XML
Sorting the chapter template by title, this sample prints the chapter titles in alphabetical order.
It may not be enough to simply execute or not execute a template for every node of a given type. XSLT provides conditional elements to support exceptions or alternatives.
The <xsl:if> element processes its nested contents if the XPath expression in its select attribute evaluates to true. It has the syntax
Though similar to conditional constructs in other programming languages, <xsl:if> has no "else" clause. For that, you must use the <xsl:choose> element. For example:
<xsl:if test="$num = 1">
This is chapter 1!
<xsl:if test="$num = 2">
This is chapter 2!
View sample XML
xsl:choose / xsl:when / xsl:otherwise
Similar to the <xsl:if> element, <xsl:choose> checks many different cases in order, with a default case to execute if all others fail. It has the syntax
Only the first <xsl:when> that evaluates as true executes its nested contents. In case they all come back false, you can execute default behavior within <xsl:choose>. For example:
<xsl:when test="$num = 1">
This is chapter 1
<xsl:when test="$num = 2">
This is chapter 2
This had better be chapter 3.
View sample XML
Using any conditional element requires more advanced XPath expressions for calculations and comparisons.
By now, you should have a good idea of what can be done with XSLT, and you should have the knowledge necessary to start writing style sheets. To bring it all together, we have prepared a more complex example that further demonstrates the power of XSLT. Keep in mind that there are still more advanced topics that go well beyond this example, but it will give you an idea of what you can do.
For this example, we start with an XML document representing the soccer matches played in the CONCACAF conference semifinal World Cup 2002 qualifiers. It contains the location, play date, and score of each match. Here is a piece of the XML data (the complete document is quite long):
<?xml version="1.0" encoding="utf-8"?>
<round level="Semifinal" group="C">
<name>Trinidad and Tobago</name>
Our XSLT style sheet presents this information in a way very similar to the official FIFA World Cup Web site. It transforms the match data into HTML, using advanced XPath to calculate wins, losses, and point totals. Mathematical expressions vary the row colors within the HTML tables. Both matching and called templates receive passed parameters and sort node sets.
To view the example at work, copy the complete documents linked below and run them through an XSLT processor.
Complete worldcup.xml document
>Complete worldcup.xsl style sheet
The result should look like this XHTML page.
How to use XSLT
As a rule of thumb, the way you present information should be separate from the information itself. Whether you are presenting HTML to a browsing human or a particular XML format to a computer, chances are your data is stored in some other format altogether. If this format is XML or can easily be translated into XML, XSLT is a good candidate for transforming your data into the presentation formats that your recipients require.
While anyone can use XSLT, there are some standard models for using it within a business. In a dynamic publishing environment such as an online store, HTML developers or template engineers are typically responsible for mastering XSLT. Back end engineers develop systems that provide dynamic data in XML, while designers and interface specialists define the visual look and functionality up front. XSLT developers reside at the center of this group, providing flexibility to present the XML as the design group sees fit.
Or perhaps your company is in the business of selling information. You store your data in an XML-compatible format, and other companies subscribe to your XML data feeds. These companies may store their data in an XML-compatible format different from yours. So you offer a translation service that converts your XML feed to their XML format using XSLT. Each different XML format requires only one or a few XSLT style sheets, and the transformations happen in real time as the data is sent to the recipient.
Beyond basic XSLT
Developing XSLT in a full production environment is quite different from simple desktop development. While beyond the scope of this tutorial, some packages are worth mentioning for the interested reader. Most have been developed for Java, although Perl and C++ are also represented:
- · The Apache Xalan processor
- · Apache Xalan implemented in C++
- · Michael Kay's Saxon
- · James Clark'sXT
- · Cocoon publishing framework
- · LotusXSL
- · Perl XSL module
Many XSL processors have custom extensions or allow you to write your own. With Saxon, you can write your own XSLT tags in Java. Apache's Xalan allows this as well and has a growing extensions library.
JSP tag libraries achieve many of the same goals as XSLT for transforming dynamic XML data into HTML in real time. The two technologies have much in common, and many companies are starting to use them together. The Jakarta Taglibs project is implementing XSLT using JSP technologies.
XSL Formatting Objects, the other third of XSL, is an XML language for complex visual presentation. It combines with XSLT for converting XML data to non-XML formats. The Apache FOP and REXP, for example, are projects for translating XML into Adobe PDF using XSL.
Jason Monberg is cofounder and president of Carbon Five, a San Francisco-based J2EE consulting company. He was previously cofounder and CTO of Sparks.com, an online paper greeting card retailer, and a consultant on enterprise Internet-based systems for clients including GM, Visa, and Levi Strauss. Jason plays Ultimate Frisbee.
Michael Wynholds is cofounder and lead architect of Carbon Five, a San Francisco-based J2EE consulting company. He was previously a lead engineer at Sparks.com and an engineer at Netscape. Michael holds a degree in computer science from UCLA and has mastered Gran Turismo 3.
Carbon Five develops enterprise information management systems based on J2EE and XML technologies. Headquartered in San Francisco, clients include Fortune 500 financial institutions dependent upon the efficient sharing and management of large volumes of information.