If you’ve been stuck on some legacy project, the gap between what you know and current and emerging technologies could be widening. But one technology that should be on your need-to-know list is XML. The idea behind XML may be a little difficult to grasp at first, although if you know EDI, you’ll have a head start. This article will introduce you to the basics of XML and explain when and why to use it. In a follow-up article, we’ll see how some alternative solutions stack up against XML.

What exactly is XML?
XML stands for Extensible Markup Language, a meta-markup language that provides a format for describing structured data. Although XML is sometimes referred to as the new language of the Web, it’s not a single predefined markup language, like HTML. Rather, XML is a language designed to supplement other languages and to allow designing your own markup. XML is often used for B2B interchanges because companies can agree on a set of tags for passing information back and forth with cost savings and improved efficiency.

In recent years, XML has become the universal format for data on the Web, allowing developers to easily describe and deliver rich, structured data from any application in a standard, consistent way.

XML is flexible. With XML you are not limited to a handful of tags as in HTML but can create as many as necessary. It’s important to note, however, that XML defines only the structure of the data, not its presentation. If you want to produce differentiated formatting for your XML documents, you’ll need to use Extensible Stylesheet Language (XSL) style sheets. XSL allows you to create one XML page and present it in different ways to different users simply by varying style sheets.

XML doesn’t replace HTML; it complements it. XML is designed for data interchange, whereas HTML is the language for displaying and presenting the data on the Web. With XML, the information is stored in the document, while the rendering instructions are stored elsewhere, so the content and presentation layers are separate.

Even though XML may look similar to HTML—and like HTML, it is derived from Standard Generalized Markup Language (SGML)—XML isn’t based on a fixed set of predefined tags. Instead, XML is structured ASCII with embedded tags that allow data to be self-describing and extensible.

Like HTML, XML is a platform-independent industry standard that the World Wide Web Consortium (W3C) manages. XML provides a uniform method for describing and exchanging structured data that is independent of applications or vendors.

XML is relatively easy to read, and, because it’s relatively simple, most modern platforms are capable of working with XML, making it a very popular Web-friendly technology.

Understanding XML is becoming more important with the creation of XML databases, various XML standards, and XML-based technologies, such as ADO.NET and Visual Studio.NET, which use XML to store, manipulate, and exchange the data.

As more XML specifications develop and evolve, XML continues to grow into a powerful and flexible technology capable of serving many application domains. XML is becoming a key part of various technologies, such as database technologies (such as DBMS and ADO), remote procedure call mechanisms such as SOAP, business-to-business and business-to-consumer integration, messaging software, data warehouses, and many others.

XML common terms
As you begin orienting yourself with XML, one of the first places to start is with the terminology. Here’s a rundown of some of the terms you’re likely to encounter.

XML document
An XML document has a logical structure (declarations, elements, comments, character references, and processing instructions) and a physical structure (entities, starting with the root, or document entity).

XML engine
An XML engine is software that supports XML functionality on the client, such as Internet Explorer 5. The components of an XML engine include the XML parser, the XSL processor, and schema support.

XML schema
An XML schema is a formal specification of element names that indicates which elements are allowed in an XML document and in what combinations. The schema also defines the structure of the document: which elements are child elements of others, the sequence in which the child elements can appear, and the number of child elements. It defines whether an element is empty or can include text. The schema can define default values for attributes.

XSL processor
An XSL processor allows developers to transform XML data to HTML, via a style sheet defining presentation rules.

XML parser
An XML parser is software that is used to read XML documents and provide access to their content and structure. The XML parser generates a hierarchically structured tree, hands off data to viewers and other applications for processing, and finally returns the results to the browser. A validating XML parser also checks the XML syntax and reports errors.

When is XML used?
Many companies and industries are using XML for collaboration. XML allows creating information once and reusing the data in various ways. It offers the flexibility of taking data from one application in one format and putting into another application in another format, with XML as a data transport format.

With XML, you can standardize the way you exchange the data with other companies or other departments within your organization. Let’s look at a scenario that illustrates one role XML might play in the enterprise.

Let’s say you own a travel agency, and clients often call you to determine the best rates on hotels. To get the information requested by your clients, you could call each company (Hilton, Marriott, Holiday Inn, etc.) to find out its prices. You could also visit each company’s Web site to obtain pricing information. The problem with these approaches is that they’re time consuming, and each vendor provides information in a different format. If you want to gather the information more quickly, and in the same format, you’ll need a different strategy.

One solution is to ask each vendor to send you the information electronically in XML format. You create an application that will translate the XML data and combine it with the translated data provided by other vendors to create a standardized listing. Of course, it’s important that you and your vendors use the same set of tags to make sure that your application is not looking for a tag called “price” when your vendor has referred to it as “amount.” With XML, you invent the tags yourself, so you have to make sure that your vendors comply with a set of tags that all parties agree on and that those tags mean the same things to everyone. Once you receive your XML data, you can use HTML in your application to display it in the format you prefer.

Let’s consider another scenario. Large organizations often require the integration of data from one system into another. XML can be used to transfer data between departments or applications. Instead of using fixed-length or delimited text to pass the data from one application to another and then creating a module to translate the data for the receiving application, you can use the XML format to make the transfer nearly seamless.

Alternatives
While XML may be the best solution in many situations, it’s important to know when it isn’t the best approach to take. Next time, we will discuss some competing technologies and examine their advantages and shortcomings compared to XML.

Subscribe to the Developer Insider Newsletter

From the hottest programming languages to commentary on the Linux OS, get the developer and open source news and tips you need to know. Delivered Tuesdays and Thursdays

Subscribe to the Developer Insider Newsletter

From the hottest programming languages to commentary on the Linux OS, get the developer and open source news and tips you need to know. Delivered Tuesdays and Thursdays