XML Schema is a W3C recommendation that provides the tools to define structure, content, and semantics of an XML document. Compared to document type definition (DTD) and XDR (XML Data Reduced)—two other schema modeling tools—XML Schema delivers two key advantages. First, since it’s the official W3C recommendation for defining the structure of XML data, organizations will be working from the same definition. Second, it is the newest schema technology, so it’s been built just to fix bugs and flaws in the other two, especially DTD.

XDR isn’t so much an alternative schema technology as it is Microsoft’s implementation of an early working draft of the XML Schema specification. In .NET, XDR is supported primarily for backward compatibility. XML extensions of SQL Server 2000 and Microsoft’s COM parser (MSXML) still use it extensively.

Let’s start by examining how the XML Schema object model allows you to use .NET classes to manipulate schema components. Then, we’ll look at several ways you can work with schema information.

A schema backgrounder
XML Schema represents the XML type system and should be used to describe classes and objects when they serialize their status to other applications and platforms. The .NET XML Schema object model (SOM) helps in building a bridge between the .NET-specific type system and the XML Schema type system, and makes it easier to programmatically create and modify schemas. A schema file is an XML document saved with the .xsd extension.

All the data types that can be used in XML Schema documents have a .NET counterpart. Once an XSD schema has been compiled into a .NET representation object model, you can access it using the SOM classes. The schema compiler assembles XSD into an XmlSchema object that exposes the schema information through methods and properties.

An effective serialization mechanism between XSD and complex binary classes on a given platform offers tremendous potential and is a key step on the way to full cross-platform interoperability. In .NET, XML serialization is accomplished through the XmlSerializer class and by exploiting the services of the XML Schema definition tool (Xsd.exe). The tool is a binary executable shipped with the .NET Framework SDK. You’ll find it in the BIN subdirectory of the .NET Framework installation path—normally, C:\Program Files\Microsoft Visual Studio .NET\FrameworkSDK.

Among other things, Xsd.exe can generate a C# or Visual Basic class from an XSD file and infer an XML Schema from a source XML file. This tool is also responsible for the XML Schema-related magic performed by Visual Studio .NET.

Examining the SOM
The .NET Framework provides a hierarchy of classes to edit existing schemas or create new ones from the ground up. The classes are defined in the System.Xml.Schema namespace. The root class of the namespace is XmlSchema. Once applications hold an instance of the class, they can load an existing XSD file and populate the internal properties and collections with the contained information. By using the XmlSchema programming interface, you can then add or edit elements, attributes, and other schema components. Finally, the class exposes a Write method that lets you persist the current content of the schema to a valid stream object.

There are two ways to create an instance of the XmlSchema class: You can use the default constructor, which returns a new, empty instance of the class, or you can use the static Read method.

The Read method operates on schema information available through a stream, a text reader, or an XML reader. The schema returned is not compiled yet. The Read method accepts a second argument that’s a validation event handler. You can set this argument to null, but you won’t be able to catch and handle validation errors in the schema being read. Listing A shows how to read and compile a schema using the .NET SOM.

Once the schema has been compiled, you can access the constituent elements of the schema as defined by the post-schema validation infoset. To access the actual types in the schema, you use the SchemaTypes collection.

One of the differences between the information available before and after compilation is that an included (not defined in-place) complex type will not be detected until the schema is compiled. For example, suppose you use the <xs:include> tag to import an external type definition. To programmatically detect the presence of the type, you must first compile the schema. The process will expand the <include> statement and bring in the type definition.

The code snippet in Listing B demonstrates how to get the list of complex types defined in the specified schema after compilation.

Once the schema has been read in memory, you can freely manipulate its structure, with the obvious limitation that indirect tags, such as <include> and <import>, are detected only as individual and stand-alone objects. In other words, they count for themselves and not for what they are expected to include or import.

Applications and embedded schemas
The schema information is fundamental for letting client applications know about the structure of the XML data they get from servers. However, schema information is just an extra burden that, especially in distributed applications, can take up a portion of the bandwidth.

In some situations, you can treat the schema like the debug information in Windows executables: indispensable during the development of the application; useless and unneeded once the application is released. This pattern does not apply to all applications but, where possible, constitutes an interesting form of optimization. Once the two communicating modules agree on an XML format, and this is hard-coded in software, how can the format of the XML data being exchanged be different?

When the generation of XML documents is not completely controlled by the involved applications, schema validation ceases to be an optional feature. The first approach that comes to mind is to have the client application store the schema locally and load it when needed to validate incoming documents. For .NET applications, the XmlSchema.Read static method is just what you need to load existing schema files.

An alternative approach entails creating and compiling a schema object dynamically and then using it to validate documents. XML validation offers yet another way of ensuring that the document is in the expected form.

With XML Schema, you have a standard way to describe the layout of the document in an extremely rigorous way that leaves nothing to the user’s imagination. As we’ve seen here, the XML Schema object model enables you to take advantage of the schema support available in the .NET Framework.