Windows 2000 Advanced Server and Datacenter Server both provide network load balancing to allow you to build a network architecture for your Web, VPN, Terminal Services, streaming media, and other IP-based services that provides fault tolerance and redundancy for mission-critical applications. In this Daily Drill Down, I’ll give you an overview of this technology to help you understand how it works and how to deploy network load balancing for testing and evaluation.

Network load balancing overview
Businesses increasingly rely on computers and intranet- or Internet-based services to support their daily operations and provide services to their customers. Whether you’re serving databases to your users or Web sites and other Internet-based services to the public or your customers, ensuring that those servers remain up 24 hours a day, seven days a week is a critical aspect of your business’ success. Ensuring that servers remain up and provide failover capability isn’t just for large organizations with several servers. Even small businesses that have only one server can benefit from network load balancing to secure against loss of business or productivity because of a failed server. If a single server handles all of your services, for example, consider the impact that the server going down would have on your business. Even if you use RAID arrays to protect against drive failures and faithfully perform backups and validate those backup sets, the occasional hardware failure still occurs. Can you afford to have your only server offline for a day or two while you rebuild it?

Windows 2000 Advanced Server and Windows 2000 Datacenter Server both provide two services—clustering and network load balancing—to make it possible to distribute services across multiple servers and reduce the load on any given server and to ensure that if a server goes down the service will be uninterrupted. Clustering provides failover support, allowing services to be available to users and customers even if a server goes down. Network load balancing allows you to distribute the load for services and also provides a measure of failover support.

Network load balancing allows you to scale your services as the load on the servers grows. For example, if you host your Web site on multiple servers in a network load balancing cluster, you can add additional servers to the cluster to handle higher volume traffic, improving response time. In addition, network load balancing allows the servers in the cluster to detect a server failure and automatically distribute client traffic to the other servers in the cluster. Network load balancing is designed to accomplish this failover and repartitioning within 10 seconds of the failure.

You can provide a measure of load balancing for certain IP-based services such as Web and FTP by using round-robin DNS, which uses multiple host records that resolve the same host name to different IP addresses (and therefore, different servers). For example, you might create four host records for www that map that host name to four different physical servers. As DNS queries come in to the DNS server, the DNS service responds to those queries with alternating host records so that in four queries for the same host (in this example), the server would respond with four different IP addresses. The advantage of using round-robin DNS is that you can do it with Windows 2000 Server, as well as Advanced Server and Datacenter Server. The drawback is that round-robin DNS provides no redundancy. There is no way for each of the servers to know if another one fails, and the DNS service continues to respond to name queries with the IP address of the failed server. So, in this example, one out of four users who try to connect to your Web site will fail in the attempt. With true network load balancing, however, users would not experience a service disruption because the failed server would automatically be taken down and would no longer receive client traffic.

An important aspect of Microsoft’s network load balancing architecture is the ability to support rolling upgrades to minimize or eliminate service downtime. You can take a server out of a cluster, perform maintenance or upgrades on the server, and then let the server resume its role in the cluster. Users need not know that a server was ever offline. At most, they might see a slight decrease in performance, although you can eliminate this by adding a sufficient number of servers to the cluster to ensure that the removal of one server from the cluster does not cause the others to operate at maximum capacity. In addition, if you have existing clusters using Windows NT Load Balancing Service, you can perform rolling upgrades within the cluster to upgrade servers to Windows 2000 Server while still allowing them to participate in the cluster.

Network load balancing in a nutshell
When you set up a cluster, you assign a primary IP address to the cluster, which is the IP address to which all of the servers in the cluster respond to client requests. Each server can also have a dedicated IP address for noncluster traffic and traffic that is specific to that host in the cluster. Network load balancing does not balance traffic to the dedicated IP address; it balances only the traffic destined for the primary IP address. Outgoing connections use the dedicated IP address as the source address so that replies to those outgoing connections are not load-balanced and potentially redirected to a different host.

You assign a host priority value to each server in a cluster. The host priority number can range from 1 to 32, with lower numbers indicating higher priority. The host with the highest priority becomes the default host and handles all traffic not intended for load balancing. Services that are not configured for load balancing are therefore handled by a single host. If the default host fails, the server with the next highest priority number in the cluster assumes the role of default host and begins processing non-load-balanced traffic.

Windows 2000 network load balancing broadcasts client requests, such as for a Web page, to all servers in the cluster. As mentioned above, each server in the cluster is assigned a priority number to identify it within the cluster, and each server is configured to handle a specific percentage of client traffic. When a client requests a Web page, for example, the corresponding HTTP requests are distributed across the cluster. Each server analyzes the traffic and then uses a randomized algorithm that determines which server should handle the traffic based on the client’s IP address, port, and other information. The identified server then processes the traffic while the others discard the packets. Load balancing applies at a granular level, so any number of servers in a cluster might process a particular client session. For example, several different servers in a cluster might handle a client’s request for a Web page, each serving different elements of the page to the client. Client-to-host mapping remains the same as long as membership in the cluster remains constant, so a given client always maps to a specific host until there is a change in cluster membership. You cannot, however, determine or specify which clients map to a specific host.

You can configure network load balancing so that each server in the cluster handles the same share of the load; in other words, you can distribute the load equally among all hosts. Or, you can configure each server to handle a specific percentage of the load. Network load balancing distributes the load accordingly. The load changes when cluster membership changes, as in the case of a failed server. Although network load balancing in Windows 2000 takes cluster membership into account when allocating client traffic to the cluster, it does not take into account such server-specific issues as memory and CPU utilization.

Network load balancing uses a heartbeat signal between hosts in the cluster to monitor cluster integrity. If the cluster status changes (host fails, is added, or removed), the remaining hosts reconfigure the cluster accordingly in a process called convergence. Through this process, the remaining hosts exchange heartbeat messages to define the new cluster state and elect a new host as the default host. By default, the number of heartbeat messages that can be missed from a give host is five, and the default period between heartbeats is one second, although you can change both of these values through the AliveMsgPeriod and AliveMsgTolerance registry values, respectively.

Client affinity
Windows 2000 network load balancing offers a feature called client affinity that allows you to control how client traffic is distributed to the cluster’s servers. The three affinity modes are None, Single, and Class C affinity. When no affinity is used, network load balancing distributes client traffic from a given IP address and multiple ports on that address to various hosts in the cluster, providing the best response time to client requests. With single-client affinity, all client traffic from a given IP address is directed to a particular host in the cluster. With Class C affinity, all traffic from a given Class C address range is directed to a particular host in the cluster.

Client affinity becomes important in applications that require that the client session state be maintained. Connections to an e-commerce site, for example, generally need to remain with one host so the server can keep track of the client’s shopping cart. If the server goes down, that session data is lost, generally requiring the user to log on again to reestablish the session state data. Adding and removing servers in a cluster can also affect session state, which is why you should add and remove hosts in the cluster during off-peak periods. However, applications can use other methods to maintain session state, making client affinity unnecessary. For example, an e-commerce application could store the user’s shopping cart on the client’s computer using cookies to overcome the need to provide client affinity.

Class C affinity primarily addresses problems that would arise from clients who access the cluster’s services through a proxy server. All client traffic coming through the proxy server would have the same source address—that of the proxy server—and therefore affect which cluster host handled the traffic. With no affinity set, the traffic would be distributed across the cluster. With single-client affinity, all client requests from a given proxy server would be directed to the same cluster host. However, if the client’s network used multiple proxy servers, it’s possible that a given session could appear to come from multiple IP addresses because the traffic is routed through multiple proxy servers. Class C affinity helps overcome this problem and ensures that client requests from these types of clients are handled by the same cluster host.

With Class C affinity in use, all traffic from a given Class C address space is directed to a particular host. For this to have the desired impact, however, the client’s network must be configured so that all of the client’s proxy servers reside in the same Class C address space. In situations where the client’s network is very large, it’s possible that the proxy servers will be distributed across a broader address space.

Choosing a network load balancing model
Another consideration when you begin thinking about deploying network load balancing is the network topology and network load balancing model you will use. First, a cluster host is not limited to providing services on a single network interface. Although you can assign only one primary IP address for an interface and have that address act as a virtual server address for clients, you can add other network interfaces to the server, each with its own primary IP address.

When you are setting up network load balancing, you can choose among four different models, depending on the number of network interfaces in the host and the way in which you want traffic distributed to the hosts in the cluster. These models are single-interface unicast, multiple-interface unicast, single-interface multicast, and multiple-interface multicast.

Each cluster can use either unicast or multicast addressing to distribute traffic. Unicast mode is the default and works with all routers. In unicast mode, however, network traffic other than that which is handled through network load balancing is not possible. In unicast mode, the network adapter’s MAC address is disabled and is replaced by a cluster MAC address that is automatically generated by network load balancing. The host’s dedicated and primary IP address both resolve to the cluster MAC address. This limits regular network traffic between cluster hosts, although each host can handle traffic that originates outside of its subnet, along with any local traffic that does not carry the cluster MAC address (such as traffic from noncluster hosts). These limitations apply when the host has only a single network interface.

If you add other network adapters to the host, the first interface is used as the cluster interface and the same conditions hold true—the adapter’s MAC address is disabled and replaced by the cluster MAC address, and the interface cannot handle traffic from outside the subnet or from within the subnet unless the traffic originates from a different MAC address. A second adapter, however, can be used to handle nonclustered, host-to-host network traffic. In this case, the first adapter’s primary address and dedicated address both resolve to the cluster MAC address. The second adapter’s dedicated address resolves to its own MAC address. This second interface can then handle network traffic from both inside and outside of the subnet as if network load balancing were not in place.

Network load balancing can also use multicast routing to distribute traffic to cluster hosts. When you use multicast, network load balancing generates a multicast MAC address automatically for the network adapter but retains the physical MAC address. The dedicated IP address for the interface resolves to the network adapter’s physical MAC address, and the primary IP address resolves to the cluster MAC address. Because of this, the interface can handle both cluster and noncluster traffic, allowing normal network traffic between the cluster hosts. If you install multiple network adapters and use multicast mode, the first adapter becomes the cluster adapter, and its primary IP address resolves to the cluster MAC address. The dedicated adapter(s) then handles noncluster traffic. It’s also important to understand the difference between dedicated IP address and dedicated adapter. The former applies only to the cluster interface. The latter applies to any additional adapter other than the cluster adapter.

Network load balancing doesn’t support a mixed mode; all hosts in a cluster must use either unicast or multicast mode. You can use any number of network adapters in a host, although only one should have load balancing enabled. The mode you choose depends on whether or not you need noncluster host-to-host communication (such as to a back-end database server), whether all your routers support multicast MAC addresses, how many network adapters you want to install per host, and other considerations related to your network’s topology, including the effect of the traffic on switches in the network.

Conclusion
As network traffic grows, you can deploy ever-bigger servers to accommodate it, but that’s only going to work for so long. Fortunately, Windows 2000 supports network load balancing, which allows you to spread network traffic across multiple servers on your network.