By Michael Kanellos
Staff Writer, CNET News.com
Data mining, the ability to find unexpected patterns in accumulated data, was born during a lunch break.
At a customer conference in the early 1990s, an executive at British department store chain Marks & Spencer was explaining his database woes to , an information retrieval specialist at IBM. The store was collecting all sorts of data but didn't know what to do with it.
So Agrawal and his team began devising algorithms for asking open-ended queries, eventually authoring a 1993 paper that would become required reading in data-mining science. The report has been cited in more than 650 other studies, making it one of the most widely cited papers of its kind.
"We were not even sure we should send it, because we thought people might think it was too simple-minded," said Agrawal, who holds the prized internal title IBM Fellow.
The anecdote illustrates why IBM invests more than $5 billion annually on technological research. With more than 3,000 people in its dedicated research division and nearly two-thirds of its 300,000 employees overall working in some sort of technical capacity, IBM has a scientific backbone that allows it to create—and lead—new or developing markets.IBM creations include the programming language Fortran, the magnetic disk drive, superconductivity and the Data Encryption Standard—not to mention the in 1928, the first public address system.
Big Blue's historic achievements have grown out of its culture as a quasiuniversity, complete with its own system of recognition. The company acknowledges every employee's third patent by elevating him or her to a new plateau. Invitations to become a Master Inventor, Distinguished Engineer or the pinnacle, an IBM Fellow, are coveted.
It is a natural progression for many new recruits from universities, which receive millions of dollars in research funding annually from IBM and serve as a pipeline to future employment both in the United States and abroad. In China, IBM's Beijing labs receive applications from about 1,800 Ph.D. or master's candidates each year. It hires a dozen.
"Companies that have a day-to-day involvement have a higher probability of landing students," said , dean of the School of Engineering at Stanford University, adding that recruitment is one of the big benefits for IBM in these programs. Hundreds of Stanford Ph.D. graduates have joined the company.
Yet one of the most important reasons behind IBM's success in luring the best and the brightest lies on the business side of its operations, far from the laboratory and other academic settings. Big Blue has intentionally blurred the line where research ends and product development begins, understanding that such versatility appeals to technologists at all levels and inspires better work.
Making it happen
Unlike some of the large corporate labs of previous generations, IBM has devised methods to better capitalize on inventions from its labs and to ensure that research projects target what customers want. Along the way, inventors and researchers take more ownership of their projects, being able to follow them through the commercial process.
"They reined in the labs about 10 years ago and made it clear to people that they had to have a strong partnership with people in the product development groups," said Randal Bryant, dean of the School of Computer Science at Carnegie Mellon University. "That 'throwing it over the wall' approach never works well."
As a result, a number of researchers leave their divisions to shepherd their technologies through the development process every year.
exemplifies that process. Before coming to IBM in 1992, the Brazilian native was a professor at the University of Kaiserslautern in Germany. Now, with the honorary title of Distinguished Engineer, he is director of the Information Integration software group while continuing to advise IBM interns on Ph.D. projects.
Others have taken similar paths. Anant Jhingran served as director of Computer Science at the IBM Almaden Research Center and the senior manager for e-commerce and data management at the company's Watson Research Center before jumping to product development a few years ago.
Today he is working on ways to combine database and Web searches. "The windows between research and development are fairly thin," he said.
In the program, for example, researchers from facilities in Silicon Valley, New York and six other locations are assigned to work on thorny aspects within ongoing contracts. Likewise, the Information Integration Leadership Board is made up of corporate customers that advise IBM on features in demand for upcoming database products.
Product designers are often required to watch as customers try beta versions, said Laura Haas, an IBM Distinguished Engineer and development manager of DB2 Information Integrator. Recently, she served as a beta tester for a project created by one of her subordinates.
"I crashed the system," she recalled. The developer cringed, she said.
The commercialization process isn't perfect. IBM invented the relational database, but Oracle became the first company to successfully take the concept to market. Attempts to unseat Microsoft's dominance in the PC industry with technologies like OS/2 and MicroChannel flopped. And RISC processors, which IBM has promoted heavily, sell in far fewer units than Intel-type chips do.
Still, "they tend to do pretty well," said Steve O'Grady, an analyst at consulting firm RedMonk. "What is different about IBM, as opposed to Bell Labs or PARC, is that they do have a lot of feet on the street in the real world. It will be interesting to see what they do with their search stuff. How do you manage vast quantities of information?"
The cutting edges
IBM is largely concentrating its efforts in five areas: , , , and .
Each of these efforts stems from an existing IBM business line. In the next several years, nanotechnologies such as and are expected to help chip designers increase the performance and reduce the power consumption of microprocessors.
"The tried-and-true path of scaling down isn't working anymore," said Robert Morris, director of IBM's Almaden Research Center, who claims that the effort to use spintronics—the precise control of tiny magnetic fields—to switch chips off and on could become "as significant as the start of the transistor 50 years ago."
Similarly, supercomputing advances inevitably trickle down to high-volume commercial servers. One of IBM's more ambitious current projects involves developing a with the University of Texas. When it emerges in 2010, the chip will churn 1 trillion operations per second.
A dominant theme of IBM's software division is data retrieval and organization. Despite the vast sums invested in databases and servers, companies still encounter major problems in getting real-time sales results or data about the same customer from different internal databases.
"People have information all over the place," Mattos said. "It is a major pain point."
Last year, IBM released , a layer of "middleware" that is already being used by an estimated 1,300 customers to pull data from various sources. Later this year, IBM will introduce an enhanced version called Masala with an add-on portion for data mining called Criollo.
So far, the results are promising. Kawasaki Motors, one of IBM's beta testers for these products, was able to create a system that could track spare parts among dealers and reduce repair times. Merrill Lynch has used it to track software licenses and cut costs.
Agrawal, the data-mining pioneer, is today working on a system that will scramble customer data in a way that will allow companies to study buying trends or other patterns while preserving strict privacy.
The service division is also getting its share of the results from the lab. IBM is at Michigan State University and other institutions to analyze how supply chains work and find recruits.
On a more ambitious scale, the effort, which aims to create "self-healing systems," is largely targeted at automating functions that have required human intervention.
Related efforts are focused on making machines much simpler to use. In its Beijing labs, researchers are tinkering with handwriting recognition systems for Asian languages and a digital home in which appliances—lights, alarm systems, dishwashers, computers—can be operated through voice commands.
In the United States, scientists are working on ways to merge e-mail with instant messaging and to integrate speech more smoothly in applications through . The experimental software has been downloaded by nearly 30,000 people, who in turn provide feedback. In times past, an application might be tested by only a few dozen people.
At the Almaden Research Center, researchers are gathering with and behavioral economists to better understand the sociological patterns of the workplace and thereby incorporate technologies more coherently.
Like many other IBM efforts, the work is grounded in practicality.
"We take data and build models that can be used to predict outcomes," said , director of Almaden Services research. The mantra among clients buying the fruits of this research, he said, is "show me the return on investment."