Ensuring high availability for Web services, database servers, e-mail servers, and other critical systems is a requirement for most IT departments. Putting a plan in place and accomplishing that task can go a long way toward easing or eliminating your worst administrative headaches. In the Daily Drill Down “Understanding clustering options for Windows 2000,” I offered an overview of clustering services for Windows 2000. In this Daily Drill Down, I’ll give you tips for using Microsoft Cluster Services (MSCS), the clustering option included with Windows 2000 Advanced Server and Windows 2000 Datacenter Server.

Overview of MSCS
A server cluster is a group of servers that function as a unit to provide an application service. Each server in the cluster is called a cluster node. The primary purpose for a cluster is to provide high availability for applications and data in the event of a server failure. Failover is the term used to describe the process that occurs when an active node in a cluster fails and the clustered application(s) being provided by the failed node migrate to another server. In other words, when the active node for the application fails, a secondary node becomes the primary node for the application. This provides high availability because the user never needs to know the server went down. His interaction with the application continues even though it is now hosted on a different server. When the failed server comes back online, the application fails back to that server, making it the primary, active node for the application once again. Failover support not only provides high availability in the case of unplanned failures but is also an excellent tool to allow you to perform server and application upgrades without affecting service availability.

Microsoft introduced MSCS in Windows NT 4.0 as a product, code-named Wolfpack. The service, which was an add-on for Windows NT, is now included in Windows 2000 Advanced Server and Datacenter server but is not available for Windows 2000 Server. MSCS allows you to create clusters of up to two nodes (servers) under Advanced Server and up to four nodes under Datacenter Server. MSCS uses the Network Load Balancing (NLB) service included with Windows 2000 for load balancing. NLB, which I covered in “Understanding Windows 2000 network load balancing,” provides load balancing for IP-based services under Windows 2000 Advanced Server and Datacenter Server. NLB supports load-balanced clusters of up to 32 nodes and is completely separate from MSCS, so you can use it to provide load balancing for Web servers and other IP services without installing MSCS.

One important consideration when planning an MSCS cluster deployment is that MSCS does not provide dynamic application load balancing. Using Exchange Server as an example, you can’t configure four nodes to run Exchange Server in a cluster and dynamically share the load for 10,000 users. You could, however, configure each server to host 2,500 users each and then designate a secondary node for each of the primary nodes, allowing each primary to fail over to the secondary in the event of a failure or required downtime. As you are considering your cluster structure, take into account the applications you’ll be running on the cluster nodes and plan the structure for primary and secondary nodes for each application as needed to satisfy availability requirements as well as static application load balancing.

Requirements for MSCS
Successfully deploying an MSCS cluster requires careful planning not only of how you will structure applications and nodes but also of hardware and network considerations. Choose servers that satisfy the minimums for Windows 2000 Advanced Server and Datacenter Server and that meet your needs in terms of processor, CPU, memory, and boot/system disk for the applications and number of users you will need to support at a maximum load.

For example, if you are creating an Exchange Server cluster, size each node as needed to accommodate the number of users it will host. Also factor in the load the server will take on if it the primary node should fail when it is acting as a secondary for another node. In other words, plan for the maximum load the server will see when it is acting as the active cluster node for the application at maximum utilization. Whether or not you choose servers that are listed on the Hardware Compatibility List for Windows 2000 is entirely your call. While most systems have no problems running Windows 2000, the expense involved in terms of hardware in setting up a cluster warrants extra consideration of compatibility issues.

In addition to the basic server hardware, each node requires two, 100-Mbps PCI network adapters: one for the public cluster interface and one for the private interface. As with the other server items, choose the same hardware for each node for simplicity of configuration and compatibility. For example, use the same video adapter in each server, the same network adapters, and so on. It’s also a good idea to purchase a spare for each to have on hand in the event of a failure. You might not be able to get the same devices a few years down the road and will be glad you have a replacement that doesn’t require any reconfiguration or driver changes. Install adapters in the same slots from one node to another, creating systems that are clones of one another.

Unlike the cluster service in Microsoft’s Application Center 2000, MSCS is a shared-media clustering service. The nodes in a cluster each have their own system/boot disks but share a common cluster disk subsystem for cluster applications and data. So, you’ll need an HCL-listed SCSI or Fibre Channel shared disk storage unit that can connect to all of the nodes. Each node requires a PCI host adapter to connect to the shared storage in addition to the host adapter for its local disks. The shared disk should be a RAID 1 array at a minimum, but RAID 5 or better is recommended for redundancy. The storage unit should support hot swapping of drives and hot spares so you can replace a failed drive without taking down the cluster. Whether you use RAID 5 arrays for each server’s local disk depends on the amount of money you’re willing to invest, but I strongly recommend it to eliminate disk failures as a potential cause for server failures. After all, ensuring 100 percent availability is your primary reason for the cluster.

As for software, you’ll need either Windows 2000 Advanced Server or Datacenter Server. Some method for name resolution is also required, such as DNS or WINS. These services should be hosted on servers outside of the cluster. However, you can use Hosts and Lmhosts files in lieu of DNS or WINS for name resolution, if necessary. Verify that the applications you need to run in the cluster are cluster-aware versions to enable the applications to support failover and failback. Finally, you should implement some means of remote administration for the cluster, such as Terminal Services or a third-party utility like pcAnywhere, Remotely Possible, or VNC.

Preparing the servers
The first step in setting up your cluster is to set up and verify the operation of each cluster node. Initially, the nodes need not be connected to a network. You can install Windows 2000 on each one, install necessary hardware drivers, and get each server up and running as a basic server. When it’s time to build the cluster, however, you’ll need to shut down all nodes and bring them up for configuration one at a time.

As part of the installation process, you need to configure the servers for the shared storage device. All devices will share the same bus, so each device must have a unique ID. SCSI controllers typically default to SCSI ID 7, so you’ll need to change the configuration on each server so the adapters each use a unique ID. In a two-node cluster, for example, configure the host adapter on one node as SCSI ID 7 and the other node as ID 6. In a four-node cluster, the adapter IDs might be 7, 6, 5, and 4, leaving 0 through 3 for the disks. Check the documentation for the host adapters to determine if they reset the SCSI bus by default when they initialize during boot. If so, disable bus reset, if possible, to reduce the possibility of data transfer interruptions between other nodes when a server boots. Verify that the bus is properly terminated.

The network topology for the cluster is another issue to consider when setting up the servers. You’ll need a unique NetBIOS name for the cluster, two unique static IP addresses for each node (one for each adapter), and a unique IP address for the cluster. A four-node cluster therefore requires nine unique, static IP addresses.

Finally, all cluster members must be members of the same domain, either as member servers or as domain controllers. If you decide to configure the servers as domain controllers, configure all nodes in the cluster as domain controllers. Do not use a mix of domain controllers and member servers. Also, you should not use cluster nodes as domain controllers for a production domain with a significant number of users for performance reasons, both to reduce network traffic to and from the cluster and to reduce server work load.

If you’re concerned about providing fault tolerance and high-availability on your network, you’re probably interested in using a clustering solution. Fortunately, Microsoft provides one for Windows 2000 using MSCS. However, MCS isn’t one of those things that you can just blindly install on your server. To minimize headaches, you’ll need to do a little bit of planning.