JP Morgenthal &Priscilla Walmsley

Mining metadata requires a carefully thought-out approach that should identify which metadata components are critical for a particular goal.

For example, if a system needs to be integrated with a new e-commerce system, then the metadata components gathered should pertain to those components that need to be used for purposes of procurement and sales. Identifying the total number of accounts that are delinquent more than 30 days is not as important as identifying the total number of pieces of stock on a particular product.

Prioritization of the metadata being identified will limit the costs associated with gathering and managing the base of corporate metadata.

This brings us to our next important point regarding metadata, and that is what to do with it once it is defined and/or extracted.
This article concludes our series on metadata. The first article described the role metadata plays in the enterprise and why you should care about it. Last week, the second installment provided a valuable list of places to mine for metadata. This content originally appeared in Wiesner Publishing’s Software Magazine and appears on TechRepublic under a special arrangement with the publisher.
Making metadata accessible
Part of a company’s commitment to capturing metadata requires two additional decisions.

  1. Where the metadata will be stored
  2. How the metadata will be made available to those who need it

In terms of storage, the most obvious answer is to use a metadata repository. This is a specialized database application designed to provide the infrastructure and support for storage of interrelated components of information.

As stated earlier, very few, if any, metadata components stand on their own. Metadata repositories not only help capture information about singular metadata components, but also about the relationships between individual components. Metadata repositories also provide important functionality for searching and browsing the available metadata, delivering one of the more important functions—producing impact analyses.

Impact analyses identify all the resources that rely on a particular piece of metadata and, therefore, assist in defining all the resources that would be impacted by a change in the location or type of data associated with a metadata component. Producing these types of reports, however, requires a dedication to inputting and updating all the information in the repository.

How can everyone find it?
For the second decision, how to make the metadata available, there is no single answer. Indeed, the answers to this question will be defined by the uses for the metadata itself.

For example, simple browsing for informational purposes can be provided in HTML for use by a Web browser. A direct application-programming interface may be provided for use by a proprietary client that offers more advanced querying and metadata management. However, perhaps the most innovative and novel method for distributing metadata is to provide it as an Extensible Markup Language (XML) document.

XML is a powerful language for representing both metadata and data combined in the same document. XML is a tag-based language that allows users to demarcate items of data through the use of named tags, called elements. In addition, elements can contain supplemental information, called attributes, which assign values to uniquely named keys.

An XML example
The following sample XML document illustrates these points:

From this example, we can see how these elements clearly define the data that they are demarcating, and provide additional metadata that helps to clarify the information. For example, on Total_Price, identifying the currency for the amount ensures that processing will occur in U.S. dollars.

Whether metadata is captured with the data in the document, as illustrated here, or the document is the metadata itself, XML is a simple and platform-neutral data format that can easily be processed by a number of tools and products.

Using metadata for application integration
An emerging area that is heavily reliant upon the availability of metadata is application integration. The reason for the rise in the interest is that metadata is required by the integration engines to automate the extraction of data.

Integration engines simplify the extraction and aggregation of data from disparate sources for the purpose of supplying data to other applications. However, these integration engines only operate if they can access the metadata for an existing application and expose it to the user.

For example, many integration engines can extract the metadata from database systems because there are well-defined interfaces for extracting this type of information. The schemas and data type information can then be provided to the integration engine user, who can decide which fields need to be input to another system (using more metadata). From this definition, the integration engine will then automate the extraction and update process.

Considering the complexity associated with writing the software to perform the actual extraction and update, these engines offer significant assistance in this process. However, without knowing where the data is located and what the data means, the integration engine is not a very useful tool.

You need a plan
Metadata is one of those concepts that have been discussed for more than a decade, but it is only recently that we have seen the emergence of tools that support and are driven by metadata. Many companies have spent a significant amount of money chasing their tails trying to gather up their metadata, much like the way a squirrel would collect nuts for winter.

However, once the till comes and tills the field, those nuts are lost. The same can be said of using a metadata repository without a complete plan for distribution, access, and maintenance of the data in the repository.

JP Morgenthal is CTO of XMLSolutions Corp., McLean, VA, and a leading expert in the area of enterprise application integration and business-to-business e-commerce. Morgenthal is also co-author of Manager’s Guide to Distributed Environments and Enterprise Application Integration with XML and Java.

Priscilla Walmsley is VP of Development for XMLSolutions. She is a leading authority on metadata and repositories. Walmsley helped develop Platinum Software’s Metadata Repository product and Microsoft’s repository.

Has metadata played a significant role in your shop? Post a comment below or send us a suggestion for a future article.