SolutionBase: Increase reliability and performance of your Exchange-based e-mail system with clustering

Does your network have a lone server to handle a heavy amount of e-mail traffic? You probably can't afford to have e-mail down for even a minute. Clustering can come in handy in these types of situations; Brien Posey offers some tips about how to properly cluster Exchange servers.

This article is also available as a TechRepublic download.

E-mail is such a mission-critical task, chances are you can't afford to have it down for a minute. You may also have so much e-mail traffic on your network that one lone server may not be able to handle all of the work. In either case, clustering can come in handy. Here are some tips about how to properly cluster Exchange servers.

Load-balancing servers

Network Load Balancing (NLB) is a clustering technology that can be used as a clustering solution for Microsoft Exchange front end servers. Each server in an NLB cluster is equipped with two different IP addresses. One of these IP addresses is unique to the individual server, while the other IP address is shared by all of the servers in the cluster. Generally, a firewall's port forwarding feature is configured to forward all traffic that is destined for a front end server to the shared IP address. From there, the NLB service determines which server within the load balancing cluster should service the request.

As I'm sure you know, a front end Exchange Server is really nothing more than an IIS server that happens to be hosting the OWA virtual directory. As such, it really doesn't matter which front end server receives a request from a remote client. All of the front end servers work in exactly the same way. When a remote user authenticates into a front end server, the front end server looks at the msExchHomeServer attribute in the Active Directory to see which backend server is hosting the user's mailbox.

The primary function of the NLB service is to reduce response time. By distributing inbound request across multiple servers, responses to client requests take less time than they would if OWA was hosted on a single server. The NLB service also provides a degree of fault tolerance. If one of the servers in the cluster fails, then the remaining cluster nodes pick up the slack until the failed server comes back online.

The primary thing to keep in mind when creating an NLB cluster is that each cluster node must be configured identically. If you make a customization to the Exchange virtual directory, then you will need to make the same customization on each server in the cluster. Otherwise users will have an inconsistent experience when they login. The user's experience would differ depending on which server happens to service their request.

One issue along these lines that is worth considering is that patches and service packs that are applied to one server in the cluster should be applied to the other servers in the cluster. Of course it's always a good idea to be consistent with your patches, but in this particular case patching inconsistencies can lead to inconsistencies in the user experience. Imagine for instance that a new service pack made a change to OWA. If you do not apply the service pack to all of the servers in the cluster, then the user experience would vary depending on which server service a user's request.

DNS round robin

Although using the NLB service is the preferred method for clustering Exchange front end servers, it's not the only technique that you can use. An alternative technique is the DNS round robin approach.

The DNS round robin approach to load balancing doesn't require you to create a front end cluster. Instead, you'd create multiple, independent front end servers, each with a unique IP address. You would then configure the DNS server so that it directs inbound requests to each of your front end servers in a round robin fashion.

Implementing DNS based load balancing is simple. Begin by opening the DNS management console. When the console opens, right-click on your DNS server and select the Properties command from the resulting shortcut menu. When you do, you'll see the server's properties sheet. Now, select the properties sheets Advanced tab and make sure that the Enable Round Robin check boxes selected (it's selected by default). Click OK to close the properties sheet and then navigate through the console tree to the forward lookup zone for your domain. Now just create a host record for each of your front end servers. Each host record that you create should reflect the various servers' unique IP addresses, but each record should use the same fully qualified domain name.

When you're done creating the necessary records, you can test the round robin forwarding by using the Nslookup command. For example, if the fully qualified domain name associated with your front end servers was, then you would use the following command: at NSLOOKUP

More than likely, you'll have to enter the command several times. As you do, watch the IP address that is returned. You should occasionally see a different IP address. This confirms that round robin forwarding is working correctly.

Although it's easier to configure DNS round robin load balancing than it is to implement the NLB service, the DNS round robin technique does have one very significant disadvantage. Unlike the NLB service, your DNS server doesn't monitor is the Exchange front end servers. This means that the DNS server will direct requests to a front end server whether it's online or not.

Front-end servers

Whether you're using the NLB service or the DNS round Robin technique to load balance requests coming into your Exchange front end, you're eventually going to have to make a decision regarding how many servers to place on the front end. Microsoft's recommendation is that you have one front end server for every four backend servers. Of course this recommendation is only a guideline. Both the NLB service and the DNS round Robin technique require a minimum of two front end servers.

According to the Microsoft guidelines, balancing the inbound requests between two front end servers should be sufficient to accommodate about eight backend servers. Of course this will ultimately depend on the demand that users are placing on the front end servers. If your organization has a disproportionately high number of users working from remote locations, you may find that you need some additional front end servers to handle the workload. In case you're wondering, the maximum number of servers that you can use in an NLB cluster is 32.

If your organization happens to subscribe to Microsoft TechNet or MSDN, then check out System Center Capacity Planner. This tool allows you to create a virtual representation of your network. You can then run simulations against the model that you create in an effort to determine whether or not a proposed configuration will be adequate to handle the anticipated workload.

Back-end cluster configurations

When it comes to clustering back end Exchange Servers, there are two primary types of configurations available to you. These configurations are active/active and active/passive. Before I explain the differences between these two types of configurations, I need to familiarize you with some of the vocabulary associated with backend server clustering.

Individual servers within the cluster are referred to as nodes. Each node runs an instance of Exchange Server 2003. An instance of Exchange Server 2003 is called an Exchange Virtual Server.

An active/passive cluster configuration gets its name because one of the cluster's nodes is not running an Exchange virtual server. This particular node functions only as a hot standby server. This node is referred to as the passive node.

The passive node is only called into action if one of the active nodes fails. Some administrators tend to think of this as a tremendous waste of system resources. After all, what's the point of having a server that may never even get used? While I agree that it does seem a shame to have a server that is unused most of the time, think of the passive node as insurance that the cluster can continue to function even after a node fails. To see why this is such a big deal, let's talk about active/active clusters for a moment.

As you've probably already figured out, an active/active cluster is a cluster in which all nodes are actively running Exchange virtual servers. This allows Exchange Server to take maximum advantage of the cluster's resources in distributing the workload. If one of the clusters nodes fails, it's up to the remaining node to pick up the slack.

In the paragraph above, you might have noticed that in the last sentence I said that it's up to the remaining node to pick up the slack, and not up to the remaining nodes to pick up the slack. The reason that I said this is because Microsoft limits active/active clusters to a maximum of two nodes. They do this to discourage organizations from deploying active/active clusters. In fact, active/active clusters will not even be allowed in Exchange Server 2007.

So why does Microsoft want to discourage the use of active/active clusters? The reason has to do with what happens during a fail over. As I explained, if one node fails the remaining node picks up the slack. The problem with this approach is that it's easy to overwhelm a two node cluster. Think about it for a second. If both nodes are operating at maximum capacity (or at near maximum capacity) in one of the nodes fails, the remaining node will not have sufficient resources to carry the workload of both servers. This could lead to second node to fail, which totally defeats the purpose of having a cluster in the first place. This is why Microsoft recommends using active/passive clusters.

Now that I've explained the reasons why it's good to use an active/passive cluster, let's talk about the issue of having a server sit idle waiting on a failure to occur. Yes, having an idle server is a waste of resources. How big of a waste really depends on the size of your cluster though. For example, if you have a two node active/passive cluster, then 50% of your cluster resources are being wasted. On the other hand though, if you have a 20 node active/passive cluster that only 5% of the cluster resources are being wasted. The key is to not think of the passive node as wasted resources though, but rather an investment in mail flow continuity.

Additional passive nodes

As I have already explained, the purpose of the passive node is that it can take over should an active node fail. What happens if two active nodes fail though? When the first active node fails, the passive node takes over for the failed node. When the second active node fails, there are no remaining passive nodes to take over for the failed server. Therefore, the second failed server will simply remain off-line.

The thing about an active/passive cluster is that you're not actually limited to a single passive node. You can easily create an active/passive cluster containing multiple passive nodes. For example, a common cluster configuration is a 5+3 active/passive cluster. What this means is that the cluster contains five active nodes and three passive nodes. In this type the configuration, up to three of the cluster's five active nodes can fail before Exchange availability is jeopardized.

Unsupported Exchange components

Because of the special nature of an Exchange Server cluster, there are some Exchange Server 2003 components that simply will not function on a cluster node. This doesn't however mean that you can't run these components on other Exchange 2003 servers outside of the cluster.

The first component that is unsupported is the Active Directory Connector. The reason why this component is unsupported is because clustering is primarily geared towards increasing the availability of mailbox servers. The Active Directory Connector on the other hand is used in keeping the Exchange 5.5 directory synchronized with the Active Directory. Being that the Active Directory Connector is not really an important component for a mailbox server, Microsoft chose not to make the Active Directory Connector cluster aware.

Just as the Active Directory Connector can't be installed on a cluster of Exchange Servers, there are some other types of connectors that also can't be installed on Exchange clusters. Specifically, the Exchange Calendar Connector, the Exchange Connector for Novell GroupWise, and the Exchange Connector for Lotus Notes are all incompatible with Exchange Server clusters.

Another component that is not supported on an Exchange Server 2003 cluster is the Intelligent Message Filter. The Intelligent Message Filter is a mechanism for preventing spam, and is typically installed on the bridgehead server.

The NNTP service is kind of a tricky one. The NNTP service is actually a part of IIS, and is a required Exchange Server component. You can't even install Exchange Server 2003 unless the NNTP service is installed first. However, once Exchange Server is configured to work as a cluster, the NNTP service ceases to function.

Yet another service that doesn't work in a clustered environment is the Microsoft Exchange Event service. As you may know, the Exchange Event service is designed to support server-side scripting agents that were developed for Exchange 5.5. This is a legacy component, and is therefore not supported in a clustered environment. If you have applications that require the Exchange Event service, you will have to host the service on a non-clustered server.

The last incompatible service that I want to talk about is the Site Replication Service. The Site Replication Service is designed to provide directory interoperability between Exchange Server 2003 and Exchange 5.5. Again, this is a legacy service and is not supported in a clustered environment.