Good schema management helps to maintain XML namespace

As XML proliferates in the workplace, the tendency for developers to roll out their own XML schemas can lead to a quagmire of competing standards. Effective management of schemas can greatly reduce this potentially expensive problem.

By Kurt Cagle

What is an invoice?
If you take a look at the vast number of different XML formats being used to describe something as intuitively simple as an invoice form, you can see one of the problems that is now beginning to crop up in many companies—namely, that everyone has a different idea of what makes up an invoice, a purchase order, an employee form, or any of the thousands of business “objects” that cross the boundaries between programming and commerce. To help you better manage this process, I’ll take a look at how to develop a good strategy for schema management and what to look for in someone who is going to manage schemas.

The problem of too many choices
As the process to create schemas from XML sources becomes simpler, the tendency among programmers and application designers is to custom-build a schema that meets an immediate need. Typically, most developers try to coordinate their actions within their immediate work group, but in larger companies, they may not necessarily be aware of parallel work being done by developers in a different part of the company, or they may not be aware of already existing schema definitions. Eventually the applications get written and deployed, and then the sinking realization is made that there are two different applications using two incompatible versions of the same (or overlapping) objects.

As Web services architectures become more pervasively deployed in intradepartmental applications, which seem to be where the Web services wave is currently strongest, this kind of scenario will occur more and more often. The problem is even worse than it may appear on the surface, because there are often times when two schemas describe overlapping parts of a specific business process or transaction, so it is often not just a case of sending out a memo asking whether someone has developed an invoice schema. In the language of programmers, the company namespace has become compromised.

XML schema manager—the gatekeeper
Most IT departments currently have a database administrator (DBA). The role of this person differs from that of a developer; the administrator acts as a gatekeeper to ensure that the database stores retain their integrity. Thus, the DBA often has to act as a referee to ensure that database programmers aren't creating unnecessary (and potentially divisive) tables, to determine which stored procedures do not have the potential of corrupting the databases, and to ensure that only authorized people can get access to the database.

Within the industry, an XML counterpart to the DBA is emerging, and it’s called an XML schema manager, or XSM. The principle role of an XSM is part librarian, part referee, and part researcher. The XSM is responsible for the repository of XML schemas that a company generates. Whenever a particular schema needs to be incorporated into an application, the schema manager should approve the use of that schema or suggest alternatives and should try to limit or eliminate those schemas that already duplicate at least some existing functionality.

Corporate namespace strategy
All companies should determine what the corporate namespace is and how it should be maintained. Such a strategy should look at a number of specific points where XML schemas can cause problems. These include:
  • ·        Industry compatibility: Any internal schema should be seen as a possible candidate for external Web services interchange once the intraenterprise infrastructure has been established. For this reason, it is worth looking at the existing literature to see what the de facto standard is for that industry group. The advantage to this comes from the fact that the closer a company standard is to an existing global standard, the easier it will be to write code that will transform one to the other for B2B compatibility.
  • ·        Modularization: It is far better to create a schema from existing schema parts than it is to try to create entire schemas from scratch. Not only does this make it easier to maintain such schemas, but the schemas can also drive the modeling of programming class hierarchies, reducing the amount of custom code needed to manipulate the XML within applications.
  • ·        Versioning: As a company evolves, so too do the requirements it places upon its schemas and other software. A schema manager is responsible for ensuring that different versions of a given schema don't create forks in a company's applications, a concern that's especially high in a distributed Web services environment.
  • ·        Ease of validation: The buzzword acronym in XML circles right now is PSVI (post-schema-validation infosets). An infoset is an abstract representation of the XML contents, independent of the syntactical markup symbols. As an XML parser validates the XML data against its associated schema to ensure that all of the data is valid and the structure is sound, it can also convert the data internally to the datatypes specified in the schema. In essence, the resulting infoset has become a distinct binary object and can be manipulated in that fashion. Providing an efficient mechanism for ensuring that such schemas are readily accessible and consistent is another role that a schema manager will likely play, especially as PSVI data from Web services become the norm.

Skill set for an XSM
A good schema manager is a person who is familiar with both the vertical industry schemas and the horizontal application schemas currently in use, and many also tend to be involved in standards development with one of the large trade organizations (such as OASIS or the W3C). Moreover, XML is most interesting when it is moving. A competent schema manager should:
  • ·        Be familiar with Web services architectures in varying flavors (.NET, Java, and a whole host of others within the open source realm), SOAP, and WSDL.
  • ·        Know about traditional databases to understand how the schema specifications within existing data stores can affect the XML schemas that will likely represent this data in transition.
  • ·        Know such languages as XPath and XSLT.

An enterprise is only as effective as its information flow. As more of that information is expressed in the new language of computing (XML), the schema manager will become one of the most vital people in your organization.

Kurt Cagle is an author and writer specializing in Web technologies, open source, Java, and .NET programming issues. He is also the owner of Cagle Communications, a software development firm in Kirkland, WA, and produces a free e-newsletter (The Metaphorical Web, which you can subscribe to here).

Editor's Picks