Thunder, a supercomputer recently installed at Lawrence Livermore National Laboratory, is possibly the second-most powerful computing machine on the planet—and it was built by a company with about as many employees as a real estate office.
California Digital, a 55-person company located on the outskirts of Silicon Valley, created Thunder from 1,024 four-processor Itanium 2 servers to perform a variety of tasks at the lab. Capable of churning 19.94 trillion operations per second, it would have ranked second in the Top 500 list of supercomputers published bi-annually by the University of Mannheim, the University of Tennessee and Lawrence Berkeley National Laboratory, had it made the deadline.
The key to the setup, and many like it, is to use the Linux operating system to lash together a lot of comparatively cheap, off-the-shelf hardware to quickly create computers with enough power to simulate the potential effects of explosions or crunch data on galaxy formation. The machines can cost millions of dollars, but they're still about a third less expensive than traditional supercomputers of comparable power.
"The tier-one vendors don't have as much of a handle on this market as other areas," said Douglas Bone, president of Fremont, Calif.-based California Digital, which has also installed large Linux clusters for several Fortune 500 companies. Other small companies are involved in the nascent field as well.
Utah's Linux Networx, for instance, is building two supercomputing clusters based on Advanced Micro Devices' Opteron processor for the Los Alamos National Laboratory: A 2,816-processor cluster will be used to study nuclear stockpiling, while a smaller 512-processor cluster will be dedicated to smaller problems with lower security clearances. The company is also creating a cluster with 2,132 Intel Xeon processors for the U.S. Army Research Laboratory.
Other companies in the market include Poway, Calif.-based ProMicro; Poland's Optimus; and Verari Systems, formerly called RackSaver. Component specialists such as Mellanox and SuperMicro are also participating.
"They are getting serious traction," said Robert Pennington, interim director of the National Center for Supercomputing Applications. "There were a couple of small companies trying to do this about five years ago, but they were just testing it."
The NCSA has had two major Linux clusters installed and is currently considering bids on another large cluster.
Sophisticated, but affordable
The turn away from monolithic machines stuffed with proprietary hardware and software built by companies such as IBM, Cray and NEC has come about through a combination of technological sophistication and appealing prices.
While computers such as NEC's Earth Simulator are still preferred for some tasks, such as weather prediction, researchers have found that most applications can be run on clusters of two- and four-processor assemblies from Intel or Advanced Micro Devices running Linux.
(Large clusters, including a much heralded one at Virginia Polytechnic Institute, have also been built out of Apple PCs running IBM chips. Srinidhi Varadarajan, the brains behind Virginia Tech's cluster, also serves as California Digital's chief technology officer.)
Many of these cluster contracts still go to large computer makers. Dell and IBM installed the first two NCSA clusters, and Hewlett-Packard has also signed several contracts. But small companies are undeniably winning many prestigious projects.
Increased familiarity at the labs with these types of systems has cut implementation times and costs. Lawrence Livermore, for example, has hired its own Linux kernel and compiler experts to speed the shift to clusters.
"We're not buying a solution. We buy the pieces individually and act as the general contactor," said Mark Seager, assistant department head for advance technology at Lawrence Livermore. "By doing this, we are getting a huge price-performance boost, by a factor of two or three."
Although the core servers are built around standardized components, heavy-duty technical expertise is required to jump into this market. Linux Networx helps customers decide how many processors and how much memory to install, as well as what sort of interconnect technology—InfiniBand, Gigabit Ethernet, Quadrics' QSNet, or Myricom's MyriNet—will be most appropriate. And it assembles and tests the servers and interconnects in its own factory before shipping them for easy reassembly at the customer site.
Software also is a huge component of the contracts.
"As the watermark of commodity, off-the-shelf technology rises, the area of our differentiation also moves up. More and more of the percentage of system value (is) from those things that go around the hardware," said Bernard Daines, founder and CEO of Linux Networx.
"There is a fair amount of sophistication needed to truly understand what is going on, on the software side of a cluster and to eliminate latency on the network," said Bone of California Digital. "There is no technical reason why someone could not figure out the intricacies of clusters, but it requires a methodological accumulation of expertise of low-latency interconnects, management tools and other things."
Technical expertise aside, these smaller companies are also getting a boost from government policies designed to help the little guy.
"As a government organization, we have to help small corporations," said Lawrence Livermore's Seager. "We feel this is something we should encourage. Being small isn't a killer as long as they are qualified."
A short history of clusters
The cluster paradigm—independent machines connected with a high-speed network—has been around for more than 15 years, according to Dave Turek, leader of IBM's "Deep Computing" team. "What's changed in recent years is that they can be assembled using Linux, Intel or AMD processors and conventional networks instead of exotic, rare or customized technology."
Many say the turnover began in 1999 and 2000. At that time, mass-produced and relatively inexpensive Intel chips began to surpass RISC chips in performance, according to Intel executives, a gap that has continued to widen. Linux and Beowulf clustering technology, which allows Linux boxes to be tied together, have also become more widespread.
Lawrence Livermore created its first Linux-Intel cluster in the late 1990s. The machine contained Pentium II chips, which limited memory bandwidth, but Seager said the lab knew the setup would evolve. The introduction of the Pentium 4 became a watershed moment for the lab by expanding bandwidth from 800 megabytes per second to 2.4 gigabytes per second, Seager said.
These smaller companies to some degree followed the lead of what the labs were doing.
"We were building these clusters by hand. They were offering to put these things together for us. They would do the assembly but not provide much support," said the NCSA's Pennington.
California Digital was founded in 1994 and initially specialized in relatively generic Intel servers. In June 2001, it bought the hardware unit of VA Linux and refocused itself on the high-performance computing market.
The Thunder project then came along after Lawrence Livermore researchers were impressed with manageability tools California Digital released to the open-source community. By chance, the company had just completed an Itanium 2 project for a large corporate client—"one of the biggest corporations in the world," Bone said—that required California Digital to build a cluster that could run 21 different applications, an unusually large amount.
Breaking into this market isn't easy, though. "This is a very tight-knit community," said Jason Waxman, director of multiprocessor platform marketing at Intel. Not only do companies have to have technical sophistication, "they have to know what it takes to win a bid and deal with government contracts."
"This is not a field where you can walk in off the street. You have to have some credibility," Pennington added.
The tight-knit nature of the market also involves to some extent the secrecy of various computing projects. Waxman knows of a relatively new start-up formed by refugees from one of the classic supercomputer makers, but he said he couldn't reveal the name. California Digital's Bone said he couldn't reveal the names of any corporate customers. Pennington, meanwhile, refused to divulge the scope of NCSA's current proposal request.
Despite these challenges, however, the growth opportunities for these companies seem strong.
"In the future, we will be getting more Linux clusters," said Seager.
CNET News.com's Stephen Shankland contributed to this report.