Open Source

Build a Beowulf cluster with Red Hat Linux and VMware

See how you can install a Beowulf cluster inside VMware using a Red Hat Linux server.


The Beowulf cluster is a number-crunching monster. In fact, you would be hard-pressed to find another type of machine that will give you more number-crunching power for your dollar.

The problem is that there are relatively few IT pros that really understand how to create a Beowulf cluster. But creating a Beowulf cluster doesn’t have to be overly complex (or expensive, for that matter). Surprised? Don’t be. With the help of VMware, I am going to show you, step-by-step, the process of creating your own Linux Beowulf cluster. Once you see how easy it is, you might just be ready to implement this powerful workhorse in your organization.

Got cluster?
Need more information on Linux clustering? If so, take a look at these TechProGuild articles:

The hardware
The primary purpose of this article is to explain how to build a Linux Beowulf cluster that doesn’t need high-end hardware or a speedy private network. With these parameters in mind, I have created two Linux virtual machines on VMware partitions running on my Windows XP machine.

Each VMware virtual machine is configured with 96 MB of RAM and 3 GB of disk space. Each node is also configured on its own private network that utilizes network address translation (NAT) to get to the outside world through my Windows XP machine.

RAM
I decided to limit each virtual machine to 96 MB of RAM, which is sufficient for the purposes of this document. For a larger production, 256 MB of memory is recommended.

The operating system
For these installations, I have chosen a minimal Linux installation. I am using Red Hat Linux 7.2 distribution CDs and have foregone the X Windows components and installed mainly just development tools and networking tools, such as rsh, nfs, and OpenSSH. Each installation is identical to all of the others.

I installed the first machine into its VMware partition and then copied and renamed it for the second and third virtual machines. In the real world, using a tool such as Ghost or rsync would be the preferred method for creating identical nodes.

Host names and edit tips
Host names for these two machines are ”mn” for the master nodes and ”sn1” for the slave node. Also, for any file edits, feel free to use your favorite text editor. Mine happens to be Pico, so that is what I used for this example.

Security
Because this cluster is not attached to the outside world, I did not install a firewall. Also, for simplicity’s sake, I did everything as the root user. If you set up your own cluster, be sure to address these two issues to provide a reasonable level of security.

The clustering software
To begin, I had to make sure that the rsh and the telnet servers were installed. One way to see if the daemons are running is to run the setup program (by running the command setup), choose System Services from the menu, and see if the daemons appear on the list. If they do not, they can be installed from CD1 of the Red Hat distribution. If you are not using Red Hat 7.2 CDs from Red Hat, then the instructions below may not be correct for the system being used. Here are the instructions for installing the servers and daemons:
  1. ·        Insert Red Hat CD1.
  2. ·        Mount it with mount /dev/cdrom.
  3. ·        Change to the RPMS directory with cd /mnt/cdrom/Redhat/RPMS.
  4. ·        Install the rsh server with rpm –i rsh-server-0.17-5.i386.rpm.
  5. ·        Install the telnet server with rpm –i telnet-server-0.17-20.i386.rpm. Note that the telnet server also requires xinetd. If you don’t have xinetd installed, you will be informed of its absence. To install xinetd, type rpm –i xinetd-2.3.3-1.i386.rpm at the command line.

Gentlemen…start your connection services!
Once I knew that the rlogin, rsh, and telnet services were installed, I needed to make sure that they were enabled, via xinetd, to start. To do so, I modified their configuration files and then enabled them by using these commands:
  • ·        cd /etc/xinetd.d
  • ·        pico rsh
    Find the line marked disable and make sure it is set to no rather than yes.
  • ·        pico rlogin
    Do the same as for rsh.
  • ·        pico telnet
    Do the same as for rsh.

Set up access
First, I needed to set up the nodes in the cluster in my hosts file. Here is what my /etc/hosts file looks like on each node for my network:
127.0.0.1           { localhost information }
192.168.29.129      mn
192.168.29.130      sn1


The hosts.equiv file was used in order to allow connections between the clustered systems to be made without passwords. Because I am providing this type of access to the root user, there is a huge security risk on my virtual production cluster. On this test cluster, using hosts.equiv in this way is perfectly acceptable—in the real world, it is not.

Here is a copy of my /etc/hosts.equiv for each system:
mn     root
sn1    root


I also set up the file /root/.rhosts on both machines with the following entries:
localhost
mn
sn1


I also made a modification to /etc/pam.d/rsh to allow root to be able to use the r commands, such as rsh and rlogin. I commented out the line with /lib/security/pam_securetty.so. Be warned that this will result in a slightly less secure system.

The next step is to make sure the rsh server is enabled. To enable rsh, type the following at the command prompt (on both machines):
chkconfig rsh on
/etc/rc.d/init.d/xinetd restart


The second command (the one with restart) restarts all of the TCP/IP services enabled in /etc/xinetd.d. In order to make sure that I can use rsh between the two machines in both directions, I did the following:
  • ·        On mn, I issued the command rsh sn1 ls -al to get a directory listing from sn1.
  • ·        On sn1, I issued the command rsh mn ls -al to do the same.

Since I received a directory listing from the remote machine, I was able to confirm that both tests worked, which means that rsh will work in both directions.

Ready to cluster
At this point, my two machines were ready to be clustered. The network communications is in place with rsh, the hosts file is set up, and everything up to this point has been tested. The next step was to install the clustering communications system. For this example, I used Parallel Virtual Machine (PVM). PVM is common in Beowulf clusters and can be used to create a heterogeneous cluster of both Windows and Linux nodes.

Get the goods
As of this writing, the current release of PVM is 3.4.4, is available for both Windows and UNIX, and is touted as having “improved use on Beowulf clusters.” The filename that I downloaded from the site was pvm3.4.4.tgz, which was saved in the /usr/local directory. To uncompress the distribution, I needed to switch to /usr/local and type tar –zxvf pvm3.4.4.tgz at the command line, followed by the [Enter].

Building the PVM tool from source
At this point, I had the PVM source located at /usr/local/pvm3 on the machine named mn on my cluster. In order to be able to properly build PVM, I needed to first set an environment variable that points to the distribution. To do this, I typed export PVM_ROOT=/usr/local/pvm3 and hit [Enter].

To build the PVM distribution, I changed to the /usr/local/pvm3 directory and typed make at the command line. The defaults for the installation worked perfectly for my needs, so I let the installer determine all system variables as needed. The build process took a few minutes.

Every machine of differing architecture needs to have the PVM software actually compiled and built on it. This is due to the hardware-specific compilation of PVM. If the hardware is similar between nodes, then simply copying the PVM installation between them is fine. But if there are any differences in hardware, you will have to compile the PVM machine by machine.

The second clustered machine
Once the mn was almost ready to go, I needed to get sn1 up and running just like mn. To do so, I placed the pvm3.4.4.tgz file in /usr/local and ran the following commands:
cd /usr/local
tar –zxvf pvm3.4.4.tgz
cd /usr/local/pvm3
export PVM_ROOT=/usr/local/pvm3
make


One last configuration
Before I started the PVM process on my two clustered machines, I created another hosts file that had only the names and real IP addresses of the machines that needed to run PVM. This file will only exist on mn, which is the master node for my cluster and is responsible for starting and stopping the PVM services on the slave nodes (as well as for assigning the slave nodes tasks). For this example, I created the file named pvmhosts in /etc with the following contents:
mn
sn1


Starting the PVM services
Starting PVM is as easy as starting any other service under Linux. Running the command /usr/local/pvm3/pvmd & (the & puts the process in the background) starts the PVM daemon. In order to move any further, I now had to start the PVM administration program from within the /usr/local/pvm3 directory. To do this, I typed /usr/local/pvm3/pvm at the command prompt.

Help!
To get a list of all of the commands that are available, simply type help at the PVM prompt.

To start the slave node named sn1, it needs to be added to the master node using PVM. To do this, type add sn1 at the PVM prompt. Unfortunately, when I checked sn1 using ps –ef | grep pvm, the PVM process was not on the list. In doing some troubleshooting, I found that I had failed to add an entry to the /root/.bashrc file on sn1, which is critical to the proper functioning of PVM. Since I was using the root user across my cluster, I needed to modify /root/.bashrc and add the following line for each clustered node:
export PVM_ROOT=/usr/local/pvm3

At this point, you would need to try it again. If it is successful, you’ll get a message similar to the following:
pvm> add sn1
add sn1
1 successful
       HOST   DTID
       sn1    100000

To check the status of both of my hosts (mn & sn1), I used the PVM mstat (machine status) command with the node name. The output from this command for my two nodes was:
pvm> mstat mn
mstat mn
       mn    ok
pvm> mstat ns1
mstat ns1
        ns1   ok


While these status codes were not very descriptive, they did tell me that the PVM process was running on both the master and the slave node. As proof that PVM was running on mn as well as sn1, I issued the following command at the sn1 console.

This output shows that the pvmd process was started on sn1 from mn.

If you have been following along at home, congratulations! You have now successfully set up a Linux Beowulf cluster based on PVM.

Stopping the PVM processes
If you want to stop a PVM process on the master server or stop a slave process, do one of the following:
  • ·        To stop a particular slave node, enter delete {node name} from the PVM program on the master node. For example: pvm> delete sn1.
  • ·        Start the ‘sn1’ node backup by using the add command, pvm> add sn1.
  • ·        To stop all PVM processes across the cluster, issue the halt command, pvm> halt.

It’s all relative
With the information in this article, you should be able to get a very simple Beowulf cluster up and running based on PVM libraries. This cluster definitely works, as the master node was able to start and stop the PVM process on the slave node without having to do any work on the slave node’s console.

Although this example is tiny in comparison to real-world clustering, the work is relative. With just a few simple steps, I was able to get a two-node cluster up and running. To get an idea of how much time and effort would be involved with the initial setup of your cluster, determine the number of nodes you would need to have an effective Beowulf cluster for your organization. Then, simply tack that number on to the steps in this example. The amount of time will be greater, but the simplistic nature of the project will remain the same. If you have the need and can find the time, a Beowulf cluster might indeed be a realistic computing solution for your enterprise.

Editor's Picks