Although most enterprises are still grappling with how to migrate legacy systems into the new world of on-demand computing and how to fund those new initiatives, companies like Cassatt represent the best hope for resolving the cost and complexity problem in the data center.
By Dan Farber
COMMENTARY — In the September issue of Esther Dyson's Release 1.0, I wrote about automating IT and mentioned a well-endowed startup, Cassatt Corp., that was developing a run-time management and execution layer for automating IT operations on commodity hardware using industry-standard operating systems, without modifying applications. In other words, Cassatt had core technology (gained through its acquisition of Unlimited Scale) to virtualize Intel-based servers, virtualize data, and allocate resources on the fly to optimize workload management and system utilization.
By abstracting the software from the hardware and inserting a layer to manage operations with fewer human resources and more reliability, Cassatt could significantly lower the cost and complexity of enterprise computing environments.
Based on my interviews with Cassatt CEO Bill Coleman (formerly CEO of BEA) while the company was in stealth mode and looking at other companies claiming similar capabilities, I generously dubbed Cassatt a potential Google of IT automation. Just as Google led the search revolution, Cassatt could lead a revolution in IT administration, automating and managing the mundane tasks that can account for 70 percent of infrastructure costs.
The company launched its first official product, called Collage 2.0, this week. (The preexisting Unlimited Scale product, Unlimited Linux, qualified as the 1.0 version.) It's difficult to assess today the impact of Collage or any other new software platform injected into a very convoluted market, but at minimum it will focus attention beyond script-based, passive management schemes toward the promised land of deep, holistic automation in which tasks are outsourced to machines rather than people.
Collage is essentially a virtualization and automation platform for managing large scale clusters of Intel machines running Linux or Windows, and allowing a single administrator to manage hundreds of servers.
According to Rich Green, Cassatt executive vice president of product development, the core technology of the product provides "on demand imaging of software onto physical and virtual machines, generating personalized operating systems and application images from a collection of files." Rather than storing whole images, Collage constructs images on the fly, which allows system updates, such as patches, to be applied every time the system reboots.
"Our virtual provisioning technology—just-in-time-provisioning—can bring up a running system in a short amount of time. We can go from a cold shutdown of a diskless server to running application in a single-digit number of minutes," said Green. "Rapid provisioning capability is essential for reallocating payloads; for example, if you have to load balance between server tiers, you can shut down a system and reallocate applications and operating system on any given tier."
However, version 2.0 only supports single-tier, Linux environments, typically for high performance computing scenarios that can benefit from lower cost scale-out x86-based systems versus costly scale-up, SMP systems. Version 3.0, which due in the first quarter of 2005, will provide the service-level automation capabilities, support for n-tier and multi-tier environments and Windows 2000 and 2003.
The service level automation for multi-tier environments will span dozens of application nodes, Green said. "You can construct policies and monitors, detect failure, balance resources and re-provision across tiers or from spare pool and bring it up without the need for an administrator." Data virtualization spreads data files across a number of systems; if the data pipe is getting clogged, additional nodes—blades or other servers—can be added dynamically, without costly over-provisioning.
"The exciting part about Collage is that applications don't have to be changed," according to IDC analyst Dan Kusnetzky, who is familiar with Cassatt's products and strategy. "Data virtualization software creates a shared file system that every application can see. You also have orchestration software to manage tasks via a single administrative function and have the ability to move sources around the network as needed. For example, Oracle, JBoss, or Apache could be installed on system automatically and reassigned or provisioned without any changes to application software."
Pricing for Collage 2.0 is $25,000 for the controller software and $1,500 per node or server managed. An SAP or Oracle solution would require about 30 nodes, the company said. Version 3.0 pricing has not been set.
Cassatt has announced partnerships with Ascential, Engenious, IBM, Informatica, and Kx Systems to deliver solutions. The company says it is currently working with Pfizer Pharmaceutical, Cisco Systems, the National Highway Traffic Safety Administration and the U.S. Department of Transportation.
Clearly, Cassatt has tapped into a potentially rich vein. According to Kusnetsky, the market for what IDC calls virtual software environments was $4.3 billion in 2003 and will grow at a 19 percent compound annual rate through 2008.
"It's a noisy environment now," Kusnetzky said. "Many companies are developing software to virtualize environments. There are number of competitors-including HP, IBM, Microsoft, Sun, Opsware, PolyServe, Qlusters and Meiosys—some of which sound similar to Collage but use a different technology approach." Qlusters and Meiosys, in particular, appear to be similar, Kusnetzky said.
In my conversation with Coleman in September, he asserted that Cassatt has advantages over larger competitors like IBM, HP and Sun: "We have a more platform independent point of view. We don't have business models at risk like the big guys with proprietary hardware and software or the need to re-instrument applications or buy new versions of applications." On the other hand, IBM's Linux division is working with Cassatt, as well as several other startups, to supplement its offerings and check out the competition.
Coleman also claimed that Cassatt has a secret sauce that solves the problem of scaling infrastructure and reducing headcount without degrading performance. He cited a three-dimensional matrix consisting of physical infrastructure capacity (processing, storage and network); throughput (the capacity need to run a job); and quality of service (if you want to go from four-nines to five-nines). "The three dimensions became the breakthrough by which we manage dynamically what is running and where, and how to map I/O processing and virtual LAN operations," Coleman said.
Cassatt, which was named after Mary Cassatt, a leading American impressionist (hence, Collage), and her brother, Alexander Cassatt , who as president of Pennsylvania Railroad oversaw the building of Penn Station that connected New York City to the rest of the country, appears to have an advantage in the pedigree of its executive staff. The impressive roster includes Steve Oberlin, former president of Unlimited Scale and chief architect at Cray Research; Brian Berliner, a co-founder of Allocity, an application-specific storage management company; Mark Forman, former CIO for the US Office of Management and Budget; Rich Green, former Sun VP of programming tools; and Rob Gingell, a former Sun Fellow and Chief Engineer at Sun. In addition, Cassatt hired a Colorado Springs-based team of Sun engineers working on remote distributed management.
Even with its pedigree, deep pockets and technology base, Cassatt and others trying to bring automation to enterprises face a non-trivial, non-technical challenge. Most companies don't have a clean room full of Intel-based servers ready for automation. They are still trying to come to grips with how to migrate legacy systems into the new world of on demand computing and how to fund those new initiatives. Nonetheless, Collage and its competitors are the best hope for resolving the cost and complexity problem in the data center.