Stay on top of the latest tech news with our free IT News Digest newsletter, delivered each weekday.
Automatically sign up today!


Martin LaMonica

Staff Writer, CNET

IBM is building software it hopes will make it the Google of corporate-search technology.

Big Blue has been quietly working on data storage software designed to greatly improve the ability of companies to find business documents scattered across their networks, Janet Perna, the general manager of IBM’s information management group, told CNET

The new software, along with other information-retrieval products IBM already has, underscores the company’s shift out of low-cost hardware, notably PCs, and into higher-margin software and services. The move is meant to accelerate IBM’s transition from a relational database company into a provider of a full range of information management software, Perna said.

“We’ve grown from our roots in relational databases,” Perna said. “What’s required is an information infrastructure to not only store and manage but also search and access all sorts of information.”

The new database-related software will let corporate customers store documents in XML, or Extensible Markup Language, format, which will greatly speed up text-related queries, she said. An early, or alpha, version of the tool is being tested with about 30 customers and is expected to be completed in the second half of next year. IBM has not named the product or decided how to package it.

Relational databases are a mainstay for corporations, storing records and transactional data. But about 85 percent of business information is stored in so-called unstructured data sources such as word processing files, XML documents and images, making it hard to locate, Perna said.

Through acquisitions and massive research investments in search, IBM is quietly becoming a leader in search technologies. The goal with search is to make querying business networks as common and easy as using Google or Yahoo for Web search.

That’s an idea that appeals to Victor Martinez, manager of data administration and information access services at Kawasaki Motors. Having seen the success of Web search engines, Martinez thinks that search tools could give end users at his company a much better handle on information, notably business reports.

“Almost everyone is familiar with some search function like Google or Yahoo. So my vision is we can expose business information in a similar mode,” he said. “And we’ll have a winner because there won’t be any training.”

Too often, company employees learn about business reports through word of mouth, or they commission a report that may already be written and stored on a company server, he said. Searching through a repository of existing reports would greatly speed up the process and potentially eliminate some redundancy.

As it expands its purview, IBM will likely butt heads with Microsoft and Oracle, as well as several smaller companies that specialize in aspects of corporate search, such as text retrieval, analysts said.

Oracle further detailed on Wednesday the fruits of several years of development in content management. Oracle Files 10g, which is meant to push the company beyond its strengths in the database market, is designed to help corporate customers store, manage and eventually search for information stored as text.

Microsoft is active in search as well and has helped popularize the idea of searching on PCs. It is developing a new file system, called WinFS, meant to greatly ease the process of digging through data stored in different programs. It also sells content management software and is developing a Web search engine for its MSN Web portal to compete with Google and Yahoo.

Meanwhile, several smaller specialized firms already have text storage and retrieval software that lets business users search through company networks. These corporate-search companies include Verity, Autonomy, Fast Search & Transfer and several start-ups, according to analysts.

Searching corporate networks can be significantly more complex than searching the Web, even though the volume of information can be much less.

Unlike on the Web, business information can be stored in many locations and in various formats, such as spreadsheets, PDFs, Web pages and even multimedia files. Corporate customers also require a reliable storage system, such as a database, as well as tools for collaboration, security and tracking regulatory compliance.

Another significant difference from the Web is that sophisticated use of search in business networks involves collecting and correlating information from multiple sources. For example, a car manufacturer could spot a potentially dangerous and costly product defect by mining through text documents stored in both customer support e-mails and manufacturing applications.

IDC estimates that the corporate-search market brought in $620 million in 2003 and is showing a healthy demand, growing at 20 percent last year. As different companies vie for a piece of the market for corporate information management, a collision between traditional content management providers and specialized search and text retrieval companies is inevitable, said Sue Feldman, an analyst at IDC.

“Content management and search vendors have always coexisted together very happily. Now we’re starting to see a unification of the two,” Feldman said. “This emerging information infrastructure is really where IBM is going, as well as Oracle and probably some others.”

IBM is constructing a content management and search product line through acquisitions and by sifting through the results of its research and development labs. About 300 people in IBM research are devoted to search-related topics.

WebFountain is a research project that seeks to improve on simple text-matching search formulas and find more meaning in documents by examining the relationships between the words in a sentence. The company has a prototype search engine called Marvel that can even find specific scenes in video clips.

Leaving the labs
Some technologies from the labs, including WebFountain, have started to appear in products. One project, called Cinnamon, resulted in improved XML document handling in IBM’s DB2 Content Manager, which is expected to be updated in the first half of next year.

In terms of shipping products, IBM got a toehold into corporate search earlier this year when it shipped DB2 Information Integrator, which was code-named Masala. An add-on to its database, it allows businesspeople to query disparate data sources. IBM has signed on initial customers and is using Masala for text-based searches on its internal portal.

With the forthcoming XML support in its database, IBM intends to change its current method of storing XML documents, and thereby improve its text retrieval. The investment is being fueled by the explosion of XML, which is increasingly used as a lingua franca for formatting business documents, such as purchase orders and contracts.

Right now IBM and other relational database companies store XML documents, which have a tree-like structure, by breaking the documents into smaller pieces and storing them as tables. With its forthcoming XML database, IBM will store and index XML documents in a tree structure, which should greatly speed text searches.

There are specialized, native XML databases already on the market, but IBM’s Perna said Big Blue’s product will have the industrial-strength performance and scale of its DB2 product.

Kawasaki’s Martinez said IBM’s technology vision is compelling because its search technology is being designed to work on different types of information and operating environments. Some niche search firms he’s looked into can only search a relatively narrow type of document, such as Web pages.

IBM’s Perna said that managing both record-related data and unstructured data, such as e-mail messages and text documents, represents the future of the data management industry.

“We very much view unstructured information evolving the same way that relational databases evolved, where companies want to have content repositories that will serve multiple applications,” she said.

IBM’s strategy is to build a full-featured information management platform, targeted at large corporate clients. Other companies are also pursuing the enterprise search market but are selling simpler products that address less complex tasks, noted IDC’s Feldman. Google’s search appliance, for example, doesn’t allow a great deal of search customization, she said.

IBM has a strong standing in the relational database market and a legacy as a mainframe database provider, but until now most of its work related to search has been in the labs. Indeed, Big Blue’s biggest challenge in the corporate search market may be its image, rather than the actual technology, said Martinez.

“You don’t think of IBM when you think of search,” he said. “That will be the trick: Will they have the marketing savvy to make it fly?”