By Larry Seltzer
I've heard it argued for many years that "computers are fast enough." I finally accepted this argument recently with respect to the average business desktop, but for servers, it's still the case that faster usually equates to better performance. But achieving that performance can be tricky.
Apart from accepting the usual periodic improvements in CPU clock speed and the like, we have had two roads to travel in the pursuit of better performance:
- Adding more processors and memory
- Using a larger number of cheaper, networked servers and dividing up the tasks
It's important not to get the idea that these two approaches are mutually exclusive, except to the extent that budgetary constraints limit your design options. You could even design a cluster of high-end multiprocessing systems, although unless you work for the government, that would probably be extravagant.
Issues of scale
In the classic PC architectures, the operating system allocates time to the various programs and threads within programs. In a cluster of systems dedicated to an application, the design of the application likely assists the allocation of the program to a particular system in the cluster. But because it's difficult to scale most single applications beyond a few processors on any operating system, you'll find that at the very high end, other techniques are used to scale the software.
Most modern operating systems implemented on really huge servers support some form of partitioning; this means that certain applications are bound to specific processors or at least are given priority access to those processors. In almost any case where a company is running 16 or more processors on a server—such as on those zillion-dollar Sun E1000 systems—you can assume that the processors are allocated among specific applications rather than running a single instance, or certainly a single application.
IBM uses another technique—virtual machines (VMs)—on those Linux mainframes they've been pushing on TV lately. Even though Linux is inherently a multiuser, multitasking operating system, IBM runs multiple instances of Linux, each in a separate protected virtual machine, on these systems for two reasons.
First, while Linux rarely crashes, if one of the instances did crash, the others would continue to run unhampered. More significantly, VM architecture lets the operating system scale better than it would on its own. Part of this may be limitations in Linux itself, but good multiuser performance requires intelligent design in server applications too. A badly designed server application might not perform well with more than a few users, but you might still be able to support more users by running multiple instances of it.
IBM also says that it gets better performance from VMs when consolidating several low-intensity servers—such as DNS and firewall and remote access—than it would from combining them into a single instance.
Scaling in the sense of the IBM commercials also means consolidation, as I discussed in a recent column. But planning for adequate performance in such an environment requires more than just throwing processors and memory at the problem; it requires complicated planning.
Consider, for example, the impact on network traffic of consolidating 25 servers onto one big Linux mainframe. On the one hand, much of the network traffic that used to flow between servers becomes interprocess or inter-VM communication within the server itself and is therefore translated into a CPU/memory cost. But on the other hand, you used to have at least 25 network interfaces dividing up that traffic; how many do you need in the new, consolidated server?
For large installations, I don't think any human being can really guess this sort of thing—that's a job for a modeling application. IBM has its own internal configurator tools for sizing servers, and there are also companies such as CIOview that sell very expensive tools to help determine the right configuration for a migration (based either on standard numbers like TPC or your own metrics). They also attempt to estimate cost—TCO, to be specific.
Consolidation goals aside, which types of apps scale better on multiple systems and which on a single larger system? The answer has to be a gross overgeneralization, but we all like to hear those, don't we? Large applications that are built for a single image, like an Oracle server or Microsoft SQL Server, likely require a more powerful system for better performance. Stateless multiuser applications, like big Web servers, scale very well to large clusters of systems.
For example, I don't know how many servers comprise www.microsoft.com, but I'm sure it's a lot. Incidentally, Microsoft probably uses what it calls network load balancing (NLB) to divide the load among the different servers for that site, and Microsoft considers NLB part of Windows Clustering Services. It essentially accomplishes the same goals as round-robin DNS, another "clustering" technique.
Microsoft has customers with hundreds of systems working on the same computing problem simultaneously, working on a 1,024-system cluster. If you count SETI@home, it's clear that you can scale an application far higher than 1,024, although it's not clear that there are many in business that are much larger.
I'm positive that the market is big enough for both approaches, and they both make sense. I would say that whatever approach to scaling your applications is least disruptive to your organization is the one that gets the benefit of the doubt. Any other approach had better prove major-league cost savings, because it's more likely to have unpredictable costs.
Larry Seltzer has written software and computer articles since 1983. He has worked for software companies and IT departments and has managed test labs at National Software Testing Labs, PC Week, and PC Magazine.