Most IT professionals now work in environments where high-availability systems are the standard. No longer can e-mail and other mission-critical systems suffer downtime due to a single server or piece of hardware failing. Many new products and services have evolved to meet the rapidly growing demand for high-availability systems. Most of these services concentrate on providing redundancy and backup systems in the event of a server or group of servers failing, but few respond to the possibility that an entire physical location will fail. In this article, we’ll look at NSI’s GeoCluster, which addresses this issue for organizations that rely on Microsoft Exchange for messaging and collaboration.
The challenge in trying to bulletproof Exchange
Recent events such as the power grid problems in California and the earthquake in the Seattle area underscore the need for data systems that transcend physical boundaries to provide true redundancy in the event of a physical catastrophe. Exchange 5.5 systems, which use a sensitive database-driven information store, are especially vulnerable to this type of issue because if more than one copy of an information store (e.g., a backup kept off-site) is brought up live at the same time, irreversible corruption can occur.
Under normal Windows clustering services, a shared disk array is generally mandatory. This makes it impossible for more than one copy of an Exchange information store to be live at any given time. However, this forces the entire cluster to reside in one physical location. Microsoft Cluster Services (MSCS) can be tricked into installing and utilizing local disk resources, but this leaves the system open to the potential multiple-information store corruption.
Enter NSI Software’s GeoCluster product. I’ve been working with this software at my company for some time, both testing internally and implementing at client sites. So far with Windows NT and Exchange 5.5 (the most common combination of platforms for the Windows environment), this one product addresses all of the aforementioned needs. This first article will outline what GeoCluster does and will walk through the basic install of a Windows NT Enterprise/GeoCluster cluster to prepare for Exchange 5.5 setup. My second article will outline the Exchange cluster setup and its special requirements.
Using GeoCluster as a solution
The GeoCluster product extends both NSI’s existing volume-replication software (Double-Take) and MSCS to provide a robust and redundant platform to create Exchange (and other) clusters in multiple physical locations. Due to the limitations of MSCS, NT Enterprise clusters are limited to two nodes and one Exchange server per cluster; however, multiple clusters can be run within the same Exchange Site should additional servers be required by the organization.
The installation begins with the usual setup of NT Enterprise server for Exchange. Microsoft recommends a set of mirrored drives for the NT system files, another set of mirrored drives for the Exchange system files and logs, and a RAID 5 array for the information store and related databases. I generally use a 4-GB slice for the NT system and at least that much for the logs (sometimes more, depending on the installation). Once the NT system is installed, MSCS is installed with a command-line switch to trick the system to use the local disks as the quorum resource (see the GeoCluster documentation for details). GeoCluster is installed over MSCS, extending the system and installing proprietary software. Be sure to reapply your service packs after installation.
During the installation of MSCS and GeoCluster, you will identify both the public network over which users will attach to Exchange and the private network—a dedicated network that carries heartbeat, failover, and replication traffic. Two or more NICs should be used to avoid potential IP conflicts or traffic bottlenecks. It is also recommended that the private network be isolated from the public, such as over a dedicated T-1 or frame relay link.
Once the network links are in place, you set up the quorum resource. This is a physical or logical volume on the local machine that will be replicated to the other node in the cluster and used to determine which node has “rights” to take control of the system in the event of a failover. During the quorum setup, you select arbitration paths, a vital part of the system.
Arbitration paths are zero-byte files that reside on any network share in the LAN or WAN. These files are locked by the controlling node of the cluster at start-up and remain locked (expiring and renewing every ten seconds) until a failover event occurs. For the purposes of GeoCluster, a failover event will be initiated after a set amount of time expires during which no heartbeat information traverses the network. You set this timeout at setup, and you can change it as needed. Should a failover event occur, each node in a cluster attempts to take control of the arbitration points distributed throughout the network.
If one node goes offline, crashes, or otherwise loses the ability to lock the arbitration paths, the other node will be able to lock them after the 10-second timeout expires. The first node to lock more than 50 percent of the arbitration paths assumes control of the cluster, so it is vital that the paths are well distributed throughout the physical network. I’ll discuss the remainder of the failover process in the second part of this series.
Once the quorum resource is defined, you proceed to create a resource group for your Exchange implementation to be installed into. You do this just as you would with standard MSCS but keeping in mind that you’ll use physical disk resources attached to this node as opposed to a shared storage array. Generally speaking, you set up a network name, IP address, and logical disks that will be shared by both nodes in the cluster.
At this point, you will set up your second node for your cluster. I usually recommend that you set up the second node at the same physical site as the first until the install is finished. After that point, you can move the second node to a different physical location. Setting up the second node on-site makes the procedure simpler and faster.
The major difference in the second install is that you will not specify any resources for this cluster. Simply install NT, MSCS, and GeoCluster and join this node to the existing cluster. You will immediately see many GeoCluster Replicated Disk Governor objects spring up in the Cluster Administrator. These are dynamically created resources that GeoCluster utilizes to ensure that a failover cannot occur if the volumes are out of sync. They will remain visible until the replication of the shared disks is complete, at which time they are dynamically destroyed.
Once the Governors are removed, you are ready to proceed to the install of Exchange Enterprise Server. The next article will cover this in detail and offer an overview of failover procedures.
How do you plan for Exchange Server disasters?
Are you considering an implementation of GeoCluster? We look forward to getting your input and hearing about your experiences regarding this topic. Join the discussion below or send the editor an e-mail.