Last week, I gave a brief overview of how to set up an NT Enterprise/GeoCluster clustering system to utilize MS Exchange 5.5 Enterprise server. This article will walk through the process of installing the Exchange system itself and outline what happens during a failover event.

Installing Exchange
To pick up where we left off, we have two nodes in a Microsoft Cluster Services (MSCS) cluster that have been extended with NSI Software’s GeoCluster product so that the cluster does not require a shared disk resource. GeoCluster replicates logical volumes on each node to keep the two in sync and handles heartbeat monitoring and failover arbitration.

Now you’re ready to install Exchange 5.5 Enterprise Server. Installation on the first (currently active) node in the cluster immediately pops up an alert dialog that a cluster has been detected. Exchange graciously notifies you that it will install the cluster-aware version of the Exchange software. This is not the only indication you will see of an other-than-normal installation, as many of the usual options available to you during a standard install are conspicuously absent. About the only setting you will be able to specify is the network name of the Exchange server, which you previously set up during the MSCS install.

Otherwise, the Exchange installation is pretty much automatic and relatively painless. You will notice a few oddities, such as the fact that the Exchange services are set to manual. Do not attempt to correct this, as the cluster services will control the state of these services as necessary. You can immediately proceed to installing Exchange on node 2, where you will be asked for even less information. Apply Service Pack 3 for Exchange (or higher; go with the latest available—SP4 at the time of this writing) to each node.

Working with your Exchange cluster
Once Exchange is in place, you may populate the public folders, add users, or do any other routine Exchange tasks. Just remember to specify the network name of the Exchange Cluster Resource that you defined when you installed MSCS as you open the Exchange Administrator. Changes made will replicate to both nodes in the cluster automatically, and users can immediately connect to the Exchange server via the network name of the Exchange Cluster Resource.

You have one other step to take care of if you are migrating from an existing Exchange server to a clustered environment. If you join the cluster to the existing site, you can successfully replicate public folders and move mailboxes into the cluster as if it were any other Exchange server in the site. Just remember to follow Microsoft’s KnowledgeBase articles on removing the first Exchange server of a site if you plan to fully migrate to the cluster.

Once the Exchange system is set up, and you’ve got things looking the way you want them, you can move the second node to another physical location. As long as the private and public networks bridge the two locations (so that the same subnets span both locations), heartbeat traffic will be uninterrupted and failover can occur without interruption of services.

When moving the second node, the recommended method is to take both nodes down, starting with node 2, and then move the second node and bring both back up, starting with node 1. Check connectivity in both locations and adjust your arbitration paths (which can be changed at any time) accordingly to fit your new physical topology.

Understanding how GeoCluster operates
Congratulations—you now have an Exchange 5.5 cluster that can withstand the total destruction of one physical location. But what happens if that should occur? I hope you never have a failover caused by anything more severe than a power failure, but the failover procedure is the same no matter what the cause.

Roughly 10 seconds after the first node goes offline for any reason, the heartbeat timeout (set by you) expires, initiating a failover event. The second node attempts to lock as many of the arbitration paths as it can. Should the first node come back online again, it too will start locking paths. Whichever node controls more than 50 percent of the paths assumes control of the cluster, and the other takes itself offline. This ensures that only one copy of the information store and other databases can ever be live at any given time.

If the second node successfully locks the required number of paths, the cluster services will automatically bring all resources online on that node. It will assume the network name and IP address of the shared resources. The outside world won’t notice anything different, and Exchange connectivity will be down only for about the 30 seconds it takes for a failover to occur. After that, users connect to the second node automatically, and MAPI, IMAP, OWA, and POP clients pick up right where they left off. A series of events can be initiated to failback the servers once the problem with the first node has been corrected, and all can be returned to the original state.

NSI’s GeoCluster, combined with careful network planning and hardware implementation, can create Exchange systems that can survive just about any disaster without apparent downtime. Your users will stay up, your data will remain safe, and you’ll sleep through the night—or at least not be kept awake by an Exchange server going down.

Does your organization need Exchange clustering?

What disaster planning recovery options do you have in place for your Exchange servers? Are you going to consider the GeoCluster product? We look forward to getting your input and hearing about your experiences regarding this topic. Join the discussion below or send the editor an e-mail.