Big Data

Mining for metadata

You may think it's tedious. Learn why metadata is invaluable for maintaining applications and more.


By JP Morgenthal & Priscilla Walmsley

Captured inside computer systems and applications are valuable assets that have a tremendous ability to ease the foray into electronic commerce and business-to-business application integration. These valuable assets are called metadata—data about data—and describe critical factors about your systems and applications, such as where a particular data source is located and the data types that are used by these systems and applications.

Metadata plays a key role in reacting quickly to new technologies, and thus in using your current systems and applications to remain competitive.

Of note, the term “metadata” is employed heavily in today’s technical literature, for a simple reason. As product vendors and companies realize the importance of metadata to their products and organizations, the greater the chance that they will expose this metadata publicly. The act of making this metadata public then becomes a feature of the application, thus making use of the application more appealing to other groups and customers.
In this article, you’ll learn what role metadata plays in the enterprise and why you should care about it. Next week, the second article in the series will provide a valuable list of places to mine for metadata. Then, the final installment will describe how to store such data. This content originally appeared in Wiesner Publishing's Software Magazine and appears on TechRepublic under a special arrangement with the publisher.
What is corporate metadata?
While this is a seemingly simple question, it does not have a simple answer. Metadata is the set of data that describes locations for data sources, data types used within applications, and dictionary-like descriptions of the data being used (for example, product number represents the unique indicator for products produced by a manufacturer). But it also includes information such as the author of a word processing document, the elements and attributes of an XML (eXtensible Markup Language) document, and the names and phone numbers in the corporate directory.

Metadata can sometimes be described reasonably as data that tells us about the data we use, but in many cases the data itself can become metadata, such as names and phone numbers in a corporate directory. If you’re looking to call someone within the company, then the name is the data for that particular use. However, the same piece of data can be identified as metadata if the information we are looking for is the person’s security code. In this case, the name describes the owner of the security code.

That point causes great difficulties for companies when they attempt to identify and categorize their corporate metadata. If they attempt this using a cursory approach to collecting metadata, the size of the data set can become overwhelming and difficult to control or use. If overanalyzed, important metadata may be discarded because it is viewed as data, not metadata. It is possible to overcome these hurdles, however, which will be discussed later in this series.

Of note, most metadata does not stand alone. That is, there are few pieces of metadata within an organization that do not require association with other metadata components in order to provide contextual understanding. For example, how useful is “account number” as a piece of metadata without understanding what type of account this particular piece of metadata relates to, such as checking or investment? This need for context requires metadata to be collected in batches, and for the relationships between the metadata to be captured as part of the overall metadata environment.

Why is metadata valuable?
This question is extremely timely given the recent focus on Y2K. The Y2K problem was originally caused by technical limitations of our systems that required us to limit the amount of memory and disk space we used to represent dates. However, correcting the problem became significantly more difficult because of the lack of available metadata to support those systems.

Metadata helps us understand our data and our systems, but more than documentation about how the system runs, it tells us where the system is running and where the physical resources being used by the system are located. Even systems with thorough documentation still require the implementers to define this level of detail once installation is complete. With available metadata, applications become easier to maintain and, if necessary, replace. Additionally, metadata helps us spot potential pitfalls and errors, such as a date field that cannot support a change in century.

JP Morgenthal is CTO of XMLSolutions Corp., McLean, VA, and a leading expert in the area of enterprise application integration and business-to-business e-commerce. Morgenthal is also co-author of Manager’s Guide to Distributed Environments and Enterprise Application Integration with XML and Java. Priscilla Walmsley is VP of Development for XMLSolutions. She is a leading authority on metadata and repositories. Walmsley helped develop Platinum Software’s Metadata Repository product and Microsoft’s repository.

How has metadata affected your work? Is there increasing pressure to put such a system in place? Post a comment below or send us an e-mail.

Editor's Picks

Free Newsletters, In your Inbox