One of the more intriguing things to come out of Microsoft lately was a nearly throw-away comment from Don Box, a principle creator of the SOAP architecture and an architect with Microsoft’s XML Standards group. When Don spoke at the IDEAlliance XML Conference 2002 in Baltimore, he announced that Microsoft had begun development on a new language he dubbed X#. While he didn’t give many specifics about the new language, he did indicate that it would be used to treat XML as a first-class citizen in the .NET programming arena.

Much of what is known outside the confines of Redmond about X# is largely speculation, but there is enough information available that a reasonable approximation of what the language entails can be made.

Initially, a number of people within the XML community assumed that X# would be an alternative to the W3Cs XSLT transformation language, used widely to convert XML into HTML or different XML formats. However, this appears unlikely given Microsoft’s emphasis on imperative programming models on the .NET platform. Moreover, the need for such an alternative transformation language doesn’t really exist—the core audience for XSLT, a rich but admittedly complex language to learn for most programmers, has stayed fairly small compared to the number of .NET developers.

Possibly a replacement for XML schema
Others within the community think it’s more likely that X# is a language that extends one of the more radical aspects of current XML development, the use of Post-Schema Validated Information Sets (Infosets). An XML document can be thought of as a structured collection of information, or an Infoset. When XML was first conceived as a subset of the document meta-language SGML, it had no concept of data type—you couldn’t specify that a <cost> tag in XML should contain numeric data, let alone restrict the data to a decimal number with two digits after the decimal point. Work on the XML Schema Definition Language (XSD), which provided this ability, was started while the XML specification itself was still under development. But it took three and a half years and a lot of wrangling to create a type definition language written in XML for XML documents.

One purpose of an XML schema is to describe a set of rules that define what makes up a valid XML document. The process of checking a document against a schema, known as validation, typically involves creating a binary object with the type information explicitly coded into the XML elements themselves. Thus, any attempt to change the contents of a <cost> tag to a string, a date, or a number outside of the specifically prescribed domain of the element will generate an error that can be intercepted by the processor. Once a document has passed validation, it can be referred to as a Post-Schema Validated Infoset (PSVI).

These PSVIs have a number of interesting and useful properties. For starters, the PSVIs can look and act a lot like classes, so much so that with the proper framework you could easily build a very robust language with object-oriented capabilities directly from PSVIs. Because a schema is effectively unique, assuming it has been assigned a unique label called a namespace, it should, in theory, be possible to attach a set of methods to the namespace that describe what the object could do—a set of methods potentially defined as XML entities themselves.

Not without precedent
There are a couple of intriguing precedents for this. Dr. Haruo Hosoya and Dr. Benjamin Pierce in 2002 developed a procedural language called XDuce that uses schema types in a similar fashion to what I’ve been talking about. Dr. Pierce later teamed with Vladimir Gapeyev, Michael Levin, and Alan Schmitt to produce another language based upon XDuce called Xtatic. Xtatic acts as a lightweight extension to the C# language that combines pattern and regular expression matching with procedural forms. The resulting language can load an XML document as a PSVI, complete with a relevant set of methods, treat it in exactly the same manner as any other CLR type class, and then write it back out as an XML document.

If that’s what Microsoft has in mind, and it can pull it off, X# could fundamentally change the way that programmers develop code. Currently, in languages such as C#, the design of classes is determined in the design phase, giving better runtime performance and efficiency at the cost of some flexibility. With a schema-based architecture, applications could in fact be specified as XML sequences that would contain linked references to the associated schemas. An application could then be built on the fly based on internal preferences or perhaps generated by some automatic process to handle a given situation, with its parts built on an as-needed basis. Surely there would be a certain initialization hit, but once created, these objects would be as viable as their classically derived counterparts. You don’t need to look much further than the architecture of .NET Runtime Services where an assembly’s MSIL is compiled as needed, to see a precedent in action.

This is not the first time that Microsoft has explored the use of XML as a full programming language, by the way. Michael Corning, Stephen Mohr, Erik Fuller, Don Kackman, and Michael John at one time pushed an XML-based architecture called Faceplates, or as Corning prefers calling them, schema-based programming techniques. Faceplates was built around the notion of Petri-Nets (incidentally, named for mathematician Carl Petri, not the petri dishes you saw in biology class) and uses state transitions and XML as a vocabulary for building complex applications. Faceplates utilized XSLT and JavaScript but could very well have used any state transformation and programming languages. Whether schema-based programming makes its way into X# is unknown at this point, but given the general fitness of XML for such programming, it wouldn’t be surprising.

Still probably a ways off
The big question is this: Will X# be appearing in a version of .NET near you? I’m not sure that it will anytime soon. Given that many of Microsoft’s current development efforts are being directed toward the release of Longhorn in a few years, it is much more likely that X# may be an interesting piece of Visual Studio 2006. Still, it’s worth watching for, as X# will likely end up being as significant a language in its own way as C# has been for the future of Microsoft.