By Tim Landgrave

One of the CIO’s key responsibilities is to manage the electronic assets of the corporation. The CIO has to provide security mechanisms to protect the assets, access mechanisms to allow users to retrieve assets quickly, and the applications and tools required to manipulate these assets. While these mechanisms are well defined and easily understood for data assets (databases, user IDs, physical files), they’re not nearly as clear for information assets (spreadsheets, word processing documents, drawings, etc.). In this article, I’ll look at the problems that arise when managing information assets using current technology and the roadmap for future solutions that will change the way we think about them.

The traditional document management (DM) system
When using the term “information assets,” I’m referring specifically to the work product created by a company’s information workers (IW). Although IW use corporate systems that manage the company’s data, the ultimate value of IW exists in their ability to organize, analyze, categorize, and sanitize this data into a format that’s easily consumed by managers, customers, partners, and coworkers. The typical result of the IW’s efforts is captured in the proprietary format supported by a productivity application. Typical examples include Microsoft Office or Lotus SmartSuite for word processing and spreadsheet documents and specialized applications such as Visio, AutoCAD, or PhotoShop for graphics documents.

Corporate DM solutions that manage these assets range from very simple to highly complex. In smaller companies, important documents are stored in shared areas on a central server (if the IW remember to save them there). Larger companies may implement more advanced DM solutions that include workflow management and document versioning that allow multiple people to work on documents at the same time and manage the editing process. However, no existing DM solution manages the reuse of finished documents. In fact, most of them don’t even contemplate it. Why? The answer is because proprietary document formats were designed for efficient storage and not for efficient retrieval and reuse.

The best that existing DM systems can do is help users find documents based on relationships in the content generated either by keyword tagging or by filters that allow the system to index the content when storing it. And different file formats support different methods of internal tagging that make it difficult for documents from different vendors to be included in a standard display within a search result. To be fair, the real problem is a limitation imposed by nonstandard file formats, which makes it impossible for existing DM systems to consider documents as anything more than a pile of words with entry points defined by keywords or index entries.

The solution: XML file formats
The real solution lies in changing the way CIOs look at these information assets. Rather than seeing them as a pile of words, CIOs need a way to add intelligence to the documents themselves. There’s no reason why these documents can’t have tags that define standard sections, formatting, definitions, and processing directives. Then any kind of data asset can be placed inside these internal tags, either text or data created by the IW or links to existing corporate data that the IW have formatted using internal tags for inclusion in the document. Once defined in this way, any document could reuse sections of another document by referencing the document name and the tag that identifies the section. Moreover, sections of documents could have their own sets of permissions and access rights, giving administrators more granular control of the information in a given document. And by applying security tags at the document level that are universally understood, systems could allow more granular document permissions—for example, the ability to view a document but not copy any section or the ability to view the document but not print it.

XML content lifecycle management
As with any new technology, analysts have already given these features a name and made grandiose projections about how quickly the technology will be adopted. Industry analysts have categorized information management solutions that use XML as the internal data format into the broad category of XML content lifecycle management (CLM) solutions. This broad categorization refers to any systems in which information assets become “queryable” and can be tagged at the document or section level using XML. Analysts have predicted that these solutions will grow by over 40 percent through 2006 (IDC) and that by 2008 the market for these solutions will exceed $11 billon (ZapThink). Whether you believe the growth and market size estimates, you have to recognize that the move from proprietary file formats and limited DM systems to file formats based on XML standards and full-featured CLM systems is inevitable.

Removing the last stumbling block
To the chagrin of most of its competitors, Microsoft will be primarily responsible for ushering in the age of XML-based CLM systems. Why? Because until the world’s largest vendor of content-creation solutions provides a way to make its documents available natively in XML format, the whole market will be at a standstill. This summer, Microsoft will release its next version of Office and a host of supporting products (collectively called the Office System) that will enable companies to begin creating their own CLM solutions. The new versions of Word and Excel will use XML as their native file store and other Office System products, such as the Windows SharePoint Services, will make it trivial to create simple CLM solutions. As other vendors work to migrate their file formats and DM solutions to take advantage of the new Microsoft capabilities, this market will begin to accelerate gradually. The biggest inhibitor to market growth will then be corporations that resist upgrading to the new Office System or to applications from other vendors that support these new XML-based CLM solutions.