Windows 2000 Cluster Service can reduce maintenance downtime

The Windows 2000 Cluster Service can be a valuable asset when updating and patching your software. Learn about this benefit, and see how you can evaluate the Cluster Sevice without purchasing special hardware.

Microsoft’s Cluster Service gained momentum in Windows NT and was carried into Windows 2000 Advanced Server and Datacenter Server with multiple improvements and additions. It’s perhaps best known for its role in helping mitigate server failure with mission-critical client/server applications, like SQL Server and Exchange Server. As a result, many administrators don’t realize that they can also take advantage of this service for their less glamorous departmental servers.

I’m going to focus here on what I think is one of Cluster Service’s lesser-known benefits: allowing rolling upgrades to ensure minimal maintenance downtime. I’ll also explain how you can evaluate the Cluster Service without any special hardware.

Rolling upgrades
In the past, most clustering involved expensive hardware and software, and some admins probably considered it to be an unnecessary complication and a waste of money. After all, our backup routines are fine-tuned and standard distribution images are at the ready; hardware RAID protects critical data; monitoring programs alert us to potential problems before failure occurs; and today's hardware has generally become much more reliable.

However, in today’s market clustering can really shine in the area of maintenance downtime. Unfortunately, it’s a common occurrence to have a new service pack, patch, or critical hot fix that needs applying for either the operating system or a specific application, and these updates almost invariably require a reboot.

Rebooting means planned downtime, and changes mean risk and testing, which can add up to loss of service to clients or expensive and inconvenient after-hours work for administrators. In addition, many companies have strict policies on rebooting production servers (including paperwork and authorization) that can mean delays that leave your server and network resources vulnerable if there’s an important security fix or patch that needs to be applied.

But if your server is in a cluster, and client resources on it are cluster-aware (file and print services, WINS, DHCP, IIS, etc.), you can simply move the resources to the other server and leave your current server free to apply the updates. Just reboot when instructed, take your time to ensure that everything seems to be working normally, and then move the services back to the original server when convenient. You can then upgrade the other server so both are running identical versions of software.

This procedure is referred to as a “rolling upgrade,” and although more traditionally documented in terms of upgrading NT4 to Windows 2000 (a one-time event), it is far more applicable to most network administrators today in terms of applying service packs, patches, and fixes. If you’ve been applying a number of security hot fixes recently in an attempt to keep your servers secure (and who hasn’t?), I’m sure you’ll appreciate this Cluster Service feature.

Even noncluster-aware applications may be able to use rolling upgrades, if they adhere to these rules:
  • Do not store program or temporary files on a shared disk.
  • Do not delete registry keys.
  • Do not change data structures.

But does the convenience of clustering come at too high a cost? Remember that in addition to extra hardware and another server license, you must be running Windows 2000 Advanced Server (or Datacenter Server) to use the Cluster Service. It is not supported on the standard version of Windows 2000 Server. The answer depends on the price your company is prepared to pay for maintenance downtime and its associated risks.

For example, if you installed an upgrade that failed (or left your server unstable) and couldn’t be uninstalled, the downtime would also include the time needed to restore from backup. A wise manager might also want to factor in risks that accompany restores (the restore fails, requiring a full reinstallation and configuration of the operating system and services).

For some companies, it’s a cost equation worth considering even for departmental servers, especially considering the other benefits that Cluster Service offers, together with ensuring high availability of services for unplanned downtime.

Evaluating Cluster Service without special hardware
One of the main stumbling blocks to learning more about the Cluster Service and evaluating its potential is that it requires special hardware (in addition to requiring Windows 2000 Advanced Server version). At a minimum, this service needs an external SCSI hard disk with a shared bus, which must be properly terminated with either Y cables with terminators or self-terminating host adapter cards. Two PCI network adapters are also required. Strictly speaking, you can use only one in an unsupported configuration, but adding another adapter is not such an issue for most people.

When you install Cluster Service (from Add/Remove Windows Components), it automatically invokes its configuration wizard. A pop-up dialog box reminds you of the importance of using hardware that is listed under the Cluster category on Microsoft's Hardware Compatibility List. Microsoft supports only clustered servers that qualify as complete cluster servers, not individual devices.

After you specify that you want to create a new cluster, the wizard checks the existence of a shared SCSI bus and will not proceed if it fails to find one. Your only choice is to click Cancel.

So you’ve got to make that initial hardware investment before you can even load up the service to play around with Cluster Administrator. Or do you? A little-known technique allows you to get around this requirement by configuring Cluster Service with a local Quorum.

The Quorum is a critical Cluster Service resource, which by default, is configured on the shared external disk. The Quorum has to exist for the Cluster Service to load. It holds the most up-to-date information on the cluster configuration so that additional joining servers can obtain this configuration. It can also act as arbitrator, deciding which server owns a resource in the case of network failure.

Without the Quorum on an external disk, other servers cannot join the cluster, and resources can't failover to another server. So if the cluster service can't configure the Quorum on an external disk, it makes sense for it to fail to install. However, you can force Cluster Service to install with a local Quorum, for evaluation purposes. The trick is figuring out how to do it, because the GUI doesn’t offer this option.

The evaluation workaround
The technique picks up after you click Cancel in the configuration wizard. Type %windir%\cluster\cluscfg.exe –L from a command prompt (or directly from the Run command window). This manually loads the same configuration wizard as before, but the –L switch instructs it to install the Quorum on your local hard disk. You can now proceed through the configuration wizard. (I'll explain this in further detail in my next article.) After the wizard finishes, you can load Cluster Administrator from Administrative Tools.

Using a local Quorum allows you to access most of the configurations you would have on a normal cluster, but you won’t be able to add other servers to it and consequently won’t be able to failover resources. However, a single server cluster is usually sufficient to evaluate the basics and check the online help. Once you can see the configuration options in front of you, assembling the pieces of the jigsaw—creating groups, resources, and failover and failback properties—is much easier than trying to work out how they all fit together just by looking at the documentation.

If you haven’t previously considered using Windows 2000 Cluster Service, I hope I’ve persuaded you to take another look at it for some of its lesser-known benefits. Most notably, it enables rolling upgrades, which can help with the everyday struggle to keep your Win2K systems up to date while minimizing downtime. I'll look at additional benefits in follow-up articles.


Editor's Picks