XML may be the key to system interoperability, but successful integration requires precise definition of the XML data’s content. Several mechanisms provide this functionality, XML Schema Definition (XSD) schemas being chief among them. Further, the .NET framework provides support for various functions involving schemas, such as validating XML data against an XSD schema. Before venturing into data validation, however, I will explain XSD schemas and provide samples of code to help you create your own schema.

A brief history of schemas
When reading about schemas, you’ll often encounter references to Document Type Definitions (DTDs). A DTD describes the structure of a Standard Generalized Markup Language (SGML) document. XML is a subset of SGML, so a DTD can define the structure of an XML document. However, DTDs are less than ideal for describing XML data because they focus on structuring text within a document. They lack the ability to describe elements in terms of data types, to define validation ranges, or to encapsulate elements with namespaces. Further, DTDs are specified in SGML syntax rather than the more familiar XML syntax.

Enter XSD. Like a DTD, an XSD schema allows you to define the elements and their attributes allowed within a document and it lets you define their relationship with regard to containment. XSD also specifies type information for elements and attributes and allows elements to be placed in a namespace. XSD defines primitive types, such as decimal, string, and time, and allows you to extend the type system by defining your own simple and complex types. Finally, an XSD schema is written as an XML document. XSD is now a World Wide Web Consortium (W3C) recommendation, and several XSD documents are available from its site, including a primer and specifications for structures and data types.

A simple XSD schema
Let’s examine a simple XSD schema for some XML data on movies being shown, similar to the examples from the article “Make XML serialization a snap with .NET attributes.” The XML data file, TheaterValid.xml, is shown in Listing A.

An XSD schema for this data, Showtimes.xsd, is shown in Listing B.

The root element <xsd:schema> defines the schema. The targetNamespace attribute declares that the components within the schema belong to the xsdShowtimes namespace. The xmlns attributes declare namespaces used within the schema. Specifications of the form xmlns:prefix=”namespace” assign a prefix to a namespace. The prefix is used to qualify components from the namespace used within the schema. For example, the components of XSD itself are within the namespace “http://www.w3.org/2001/XMLSchema”, and so the root element of the document is specified as <xsd:schema>. The xsdShowtimes namespace is assigned a prefix of mst. The elementFormDefault attribute declares that elements in the target namespace must be qualified with the namespace prefix.

The schema defines the element theater (of type Theater), the complex types Theater and Movie, and the simple type PhoneNumber. A simple type is a type definition for attributes and elements that don’t have attributes and don’t contain other elements. A simple type derives from an existing simple or primitive type and restricts the values that it may contain through facets exposed by the type. PhoneNumber derives from the XSD primitive type string and specifies a regular expression to restrict values through string’s pattern facet. Valid PhoneNumber values are strings that begin with exactly three digits enclosed in parentheses, followed by three digits, a dash, and four digits. XSD provides simple types of its own, known as derived types. These include integer, a restriction of decimal, and positiveInteger, a restriction of integer.

A complex type is a type definition for elements that have attributes or contain subelements. The complex type Theater defines a type with the element sequence name, phone, and movie. The name element is an XSD string. The phone element is of the type PhoneNumber. The movie element is of the type Movie, the complex type defined after Theater. The name and phone elements must occur exactly once within the sequence. The min and max occurrence attributes of the movie element allow it to occur zero or more times within the sequence. The complex type Movie defines a sequence of elements, like Theater, but also defines the minutes attribute as an XSD positiveInteger.

What next?
By creating a schema for your data, you define what your XML data should look like. Then, you can use this schema to validate your XML data. Therefore, the next step is to actually use the schema for that purpose. In my next article, I will explain how to use a utility from the .NET SDK to generate serialization classes. Then, I’ll discuss validating your XML data against your schema.