Servers

Beowulf clusters on Linux merge low costs with raw power

Examine the hardware and configuration that one administrator used to set up a powerful Beowulf cluster on Linux.

One of the strengths of a Beowulf cluster is its relatively low cost. The ability to use off-the-shelf, common components to build the nodes helps keep the costs of the Beowulf cluster lower than other types of clusters, and significantly lower than that of a supercomputer. When the final bill is tallied, it’s apparent that the Beowulf’s low cost is partly the result of a balancing act. The lower cost of the off-the-shelf server hardware offsets the higher cost of the powerful network hardware that is needed. (The network hardware has to be fairly powerful in order to be able to handle the demand that the cluster will place on it.)

In this Daily Drill Down, I outline how to build a cluster from scratch by specifying a possible hardware configuration for the cluster. Then, I will provide a detailed list of the costs of the cluster, so that you can get an idea of what the sum total for a cluster installation might be.

Beowulf information
For more information on the Beowulf cluster, check out my article "The mysteries of Linux clustering technology revealed."

The setup overview
My Beowulf cluster consists of the following:
  • Two master servers
    The two master servers provide access to the primary network and ensure high availability of the cluster. The services they provide will continue to be accessible even in the event of a failure of one of the master servers. Each server has dual Gb Ethernet connections to the network in order to better keep up with the high speed of the workstations.
  • Sixty client nodes
    The number of client machines you choose will depend on your needs. For my Beowulf cluster, I will not differentiate the nodes into specific functions, so all of the nodes on my cluster are identical. While I could have chosen to build the client nodes from parts, that is not as cost effective as it used to be. Plus the system warranty is not as complete when you build from parts as opposed to buying a PC from a vendor. Therefore, I chose a workstation vendor and specified a configuration. Each of the client workstations shares a central monitor, keyboard, and mouse connected via KVM switches. Each client is equipped with a Gb Ethernet adapter to communicate with the network.
  • A network
    My Beowulf cluster has Gb Ethernet layer-2 switches at the core. Since this is a cluster, layer-3 routing functionality is neither desired nor required.
  • Room for expansion
    In my cluster, I have also left some room for future expansion of the cluster in the event it is required.

The master servers
Since I prefer Dell equipment, I based my Beowulf cluster master servers and client nodes on Dell hardware. For the master servers, I want fast hardware that has built-in redundancy and reliability (multiple power supplies and RAID, for example).

Here is the configuration that I chose for the two Dell PowerEdge 4600s that serve as my master servers:
  • Dual Intel Xeon 2.2 GHZ/512 KB cache
  • 2 GB DDR SDRAM 4X512 MB
  • Redundant power supplies
  • PERC3/Di 128 MB (2 Internal Channels)
  • 3 x 18 GB 15 KB RPM Ultra 160 SCSI Hard Drive (hot pluggable)
  • 2 x Intel Pro 1000XT 1000Base-T Gb network adapters
  • Dell Remote Assistant Card
  • 17-inch monitor
  • CD ROM drive, floppy drive, keyboard, and mouse
  • 3-year, next-business day, on-site warranty

I chose the above configuration in order to have a server that is very powerful, while still leaving room for expansion. I chose the Dell PowerEdge 4600 instead of a lower end model due to the availability of Xeon processors. I also chose 15,000-RPM disks instead of the usual 10,000 RPM to maximize the efficiency of the disk subsystem.

The Dell PowerEdge 4600 comes with both Gb and 10/100 Ethernet adapters on the board. I chose to add the two Intel Gb Ethernet adapters in order to have identical hardware to make it easier to bond the channels together to the clustered network. I use the on-board adapters to connect to the outside network.

Linux cuts cost
The Dell PowerEdge line can be shipped with Red Hat Linux, which will cut $750 off the cost of the server. (This is quite a contrast to adding $3,000 for NT Server.)

On the warranty front, I chose a three-year, next-business-day warranty, since I am implementing a highly available cluster with two main servers. This way, one server can be down for a short period of time for repair. Next, I configured the servers with Dell Remote Assistant Cards in order to allow for remote administration of the box over the network, even when the server is down. Finally, in order to help ensure high availability, I configured the servers with redundant power supplies.

Each of these units costs about $8,500 before tax and shipping, so for two units I spent at least $17,000 of my budget.

Clustered nodes
My Beowulf cluster must be fast and reliable. Therefore, I chose single-processor Dell workstations with adequate RAM and fast disks, as well as Gb Ethernet adapters. Redundancy in the individual node is not necessary since nodes can easily be replaced. Additional nodes can easily be added, as well, to increase the overall power of the cluster.

Here is the configuration for each of my clustered nodes. Each node consists of a Dell Precision 340 configured with the following specs:
  • Intel Pentium 4 processor, 1.80 GHz
  • 512-MB PC800 ECC RDRAM
  • ATI Rage 128 Ultra with 32 MB RAM
  • 40-GB 7200 RPM ATA-100 IDE
  • Microsoft Windows XP Professional
  • Logitech, PS/2 (2-button, no scroll)
  • Keyboard, mouse, CD ROM drive and floppy drive
  • Gigabit Ethernet adapter
  • 3 years Parts + Onsite Labor (Next Business Day)

Each of these units costs approximately $1,350 before tax and shipping, which reinforces the concept of inexpensive commodity hardware for the nodes in a Linux Beowulf cluster. The total cost for the sixty clustered nodes is $81,000.

Why Windows?
You may notice the line “Microsoft Windows XP Professional” in the specifications. Unfortunately, there is no option to ship these workstations without an operating system. When they come in, I will just format the drives and install Linux—and, since the machines only cost $1,350, I don’t mind. If you are a Linux purist, you will read that and cringe. Never fear, for the purists there is PogoLinux. For a pure Linux workstation take a look at PogoLinux' Altura line.

The network gear
The network is one of the most critical components of a Beowulf cluster. Do not under-provision the network and risk lackluster performance. For this piece of the cluster, I have chosen the Cisco Catalyst 4006 switch. With its 64-Gbps backplane, it is more than powerful enough to handle the requirements of this cluster. Plus, it is backed by a good company.

Here are the general specs for the Cisco Catalyst 4006 switch:
  • Supervisor Engine II (switch fabric and management engine)
  • 3 x 24 Gbps switching engines
  • 18 million packets per second throughput
  • Layer 2 switching
  • 2 x 48 port 10/100/1000BaseT Ethernet modules
  • Dual power supplies and a fan tray

An approximate street price for this configuration is about $28,000. This configuration not only provides enough backplane bandwidth to support my cluster, but it leaves three payload slots available to allow me to add additional nodes to the cluster.

Miscellaneous hardware
Once I had the servers, the clustered node workstations, and the network gear, I just needed to make a few additional purchases to get everything up and running. Since I will not have monitors on my workstations, I had a couple of choices. First, I could use Secure Shell (SSH) to remotely access the machines if I needed to. Or I could use a KVM switch to connect all of the nodes to a central keyboard, monitor, and mouse. Since I like to be able to get to the console if I am having trouble with the network, I opted to purchase KVM switches and cables. I chose to buy these from my favorite switch vendor, Raritan.

All in all, it took about $6,000 to cover the purchase of four 16-port units and the necessary cables. In addition, a large number of good quality category 5e (or better) patch cables were needed, which added another $500.

The total cost
The total cost of all of the parts described is broken down in Table A.

Table A
Description Cost
Servers $17,000
Clustered workstations $81,000
Network hardware $28,000
Miscellaneous equipment and supplies $6,500
Total Cost $132,500

Wait! Before you balk at the cost, keep in mind that this configuration is for sixty nodes and two highly available redundant servers, all served by a 96-port 10/100/1000 switch, and all of which is connected to one central keyboard, monitor, and mouse.

If you want to spend less, it’s easy to reduce the server configuration or find a different workstation vendor that may offer lower prices. Another means of reducing cost is to choose different network hardware or only run at 100Mbps rather than at gigabit speeds. You could also opt out of the KVM solution and use a monitor/keyboard/mouse-on-a-cart. This portable solution would save approximately $6,000. On the other hand, if you want to spend more money, you could boost the power and buy larger servers or dual processor nodes. Again, there are a number of solutions that can be found to meet practically any clustering need.

With a supercomputer costing anywhere from hundreds of thousands of dollars to somewhere in the millions, it’s easy to see why a Beowulf cluster can be cost effective. And the Beowulf cluster offers significant benefits (outside of cost) that you will not find in a supercomputer or mainframe. Expanding the Beowulf cluster means simply adding more machines. Because of the ability to quickly (and cheaply) upgrade the Beowulf, this type of cluster is perfect in situations where scalability is key. Along the same vein, individual nodes can be assigned to individual tasks, a configuration that gives the cluster a great amount of flexibility in the enterprise environment.

What does it look like?
As you can see in Figure A, I have connected 30 of the clustered nodes to each of the two 48-port 10/100/1000 blades in the Catalyst 4006. Each of the servers has two bonded (or trunked) Gb Ethernet channels to one of the two backplanes, as well.

Figure A
Seeing the scope of our cluster puts to rest any unease concerning the cost.


The additional, on-board Gb Ethernet adapter is used to connect the server to the external network. I chose to use the 100-Mbps, on-board Ethernet adapter as the mechanism by which the heartbeat is maintained between the two servers. By looking at Figure A, you can see that the only single point of failure in this design is the Catalyst backplane. All of the other components are redundant and the cluster can continue to function even with the loss of multiple nodes, a single-48 port Catalyst blade, or even one of the servers.

Things to remember
The final product of this exercise is a fast, expandable, highly available, and powerful Linux Beowulf cluster. To achieve this power you must remember these critical points:
  • The network equipment chosen is critical. The network speed must be able to handle the load of the cluster.
  • In order to provide high availability, the server hardware itself must be redundant.
  • The workstations, while the primary workhorse of a cluster, are commodity items and should be treated as such.

Coming up next
In upcoming articles, the TechProGuild Linux track will continue to examine the nuts and bolts of the Beowulf cluster. Specifically, there will be an article covering the software you’ll need to create an effective Beowulf cluster, as well as a complete how-to guide on building your own Linux Beowulf cluster.

 
0 comments

Editor's Picks