By Todd Underwood

Server clustering is the architecture of choice for Web service delivery and computation across networks. Clustering eludes a single definition, embracing several distinct concepts. In general, the idea is to use multiple computers to accomplish a single task.

When the task is a large computation, the tactic is referred to as computational clustering. Letting one server take over for another in the event of failure is failover clustering. Using multiple machines to handle more traffic than a single machine could is called load balancing. Load balancing and failover clustering are frequently combined to deliver highly available, scalable services such as Web access, e-mail, and FTP servers.

Recently, Linux clustering has made great strides and has thus brought high availability computing to a much wider audience by offering these services at a much more reasonable cost.

Check out CNET Enterprise Business

This article is published courtesy of CNET’s Enterprise Business section, where you can explore IT business solutions on various topics including ASPs, Linux, groupware, information systems infrastructure, supply chain management, and much more.

Behemoths become clusters
Clustering, as a paradigm for large-system design, has become popular primarily for financial reasons. In the mid-1980s, if you wanted to serve more customers or calculate bigger numbers, you bought a bigger, faster, more expensive computer. IBM, Sun, HP, and others had no problem selling large computers because big machines could do things smaller computers couldn’t.

By the early 1990s, researchers realized that similar results could be achieved by spreading the workload across many smaller, less expensive systems. Big iron vendors such as Cray and IBM were already implementing this kind of parallelism inside their supercomputers, so the idea of doing it on a smaller scale made sense.

From this concept came efforts such as the Beowulf Project, which was formed to create computational clusters out of Linux machines. In fact, the Cplant cluster at Sandia National Laboratory is composed entirely of Linux servers running on Compaq Alpha processors and is now considered one of the 100 fastest supercomputers in the world (see the TOP500 site for the rest).

Clustering for services
For business applications, clustering was used for failover and load-balancing services for Web sites (service clustering). Many vendors now offer failover and load-balancing solutions that cost substantially less to implement than supercomputers.

Hardware vendors (such as Cisco, F5, and Foundry), software companies (such as PolyServe, Turbolinux, and Red Hat), and the Linux Virtual Server project have all developed clustering products. Over the past two years, Linux service clustering solutions have become much more affordable and easier to implement.

Hardware vs. software
It’s often thought that a significant difference exists between hardware- and software-based service clustering. But all load-balancing systems run software of some kind. Some software is just more effectively integrated into a particular hardware platform.

Appliance-style service clustering devices have been around for a long time and typically work well. Their main drawback is cost: $10,000 or more for a low-end server with lots of limitations. Also, hardware systems are often less flexible than software clustering solutions but are usually somewhat easier to implement.

Software-based load balancing, provided by products such as Turbolinux Cluster Server, PolyServe LocalCluster Enterprise, PolyServe Understudy, and Linux Virtual Server, must be installed on top of a Linux operating system, which can make implementation more complicated. Rather than simply plugging in a server appliance, administrators must configure an operating system in addition to installing and configuring the clustering software. On the other hand, software-based systems are usually far less expensive.

Failover is typically provided in one of two ways: A routing server receives all incoming requests and forwards them only to servers that are functioning correctly, or the servers in the cluster monitor each other and take over automatically in the event of a failure. The first solution is easier to implement and allows the clustered servers to run any operating system. However, it’s more costly because it requires a hardware request router. The second solution solves the failover problem more directly but makes load balancing more difficult, since the machines must forward requests to each other to balance the load effectively.

Most sites prefer to use one server (or two for increased reliability) to route requests, separate from the servers that respond to the requests. TurboCluster, Piranha, and Ultra Monkey are examples of products that provide request routing services. PolyServe’s clustering products don’t use a request router, relying instead on round-robin DNS to balance load and provide failover only.

Shared storage
Another sticky issue with cluster implementations is serving the same content from all of the servers in a cluster. Since several Web servers might be used for a single site, it’s crucial that each server has access to the same content files.

There are two ways to accomplish this: network file systems and replication. A network file system uses a network-based storage server, which could be another Linux-based server or a network appliance server. The clustered Web servers then mount the storage server’s file system using a protocol such as NFS. With this approach, all the servers in the cluster have access to the same set of files. This offers advantages such as simplicity and ease of implementation, but the trade-off is significant: The cluster has a single point of failure. If the storage server crashes, the whole cluster goes down.

With replication, site content is copied from one server to the others in the cluster. Each server is completely independent and able to run even when another one in the cluster fails. But replication is a complicated process that’s difficult to configure and maintain. Commercial clustering products such as Turbolinux Cluster Server and PolyServe LocalCluster Enterprise include primitive replication tools. Replication may be a good solution for smaller sites, but larger sites would be wise to stay with highly reliable storage servers.

Clustering for everyone
Linux clustering isn’t perfect. Much of the available software is still rough around the edges, but it works and is relatively cheap. Traditional UNIX- and Windows-based clustering was originally designed for high-end data centers that could spend big bucks to ensure uninterrupted service. The advent of inexpensive Linux clustering solutions makes that same level of reliability available to companies that otherwise may not be able to afford it.

Have a comment or a question?

We look forward to getting your input and hearing your experiences regarding this topic. Post a comment or a question about this article.