To minimize downtime when one of your application servers crashes, you should configure a fast, reliable, and automatic method of offloading client requests to another machine. The following tips can help you keep the rerouted requests transparent to the end user or application making the call. With a failover system in place, you can minimize, if not completely eliminate, critical server downtime.

Decouple the client interface and the server
Think of your application server as a modular component in the system. If the client interface and server are tightly coupled, rerouting client requests becomes unnecessarily difficult.

When you decouple your interface and your server, the interface must maintain the state of all requests sent to the server. (We’ll examine managing requests a little later.) Once decoupled from the interface, the server is free to respond to requests for services through a standardized message system.

After decoupling, the same interface can interact with multiple servers in a load-balancing capacity. This approach yields both scalability and, more importantly, the ability to failover from one server to another in the event of a server outage.

Balance the load
With multiple application servers available for servicing client requests, it becomes the responsibility of the server interface to determine which box should handle the request. Methods of load balancing range from a simple round robin approach (try this server, then that one, then the next one) to configuring servers to dynamically communicate their capacity and load to the interface to assist in the determination.

Under optimal circumstances, load balancing between servers is merely a performance issue, but when planning for failover, the interface must ensure that the remaining machines can handle the failure of one server. In a two-server system, each machine should operate at 50 percent or less capacity, so if one fails, the other can assume the load. As more servers are added, individual server capacity may be increased. With failover systems, you are normally concerned only with the unexpected loss of a single server at a time.

Obviously, if you are planning for simultaneous failure of multiple servers, you will have to adjust the number of servers managing the failover.

To prevent the complete breakdown of interface-server communication, the interface should be able to manage a certain number of requests in a queue, waiting for an available server if maximum capacity is reached.

You gotta have heart
For the front end to know when it must shift the load of a failed server to the others, a mechanism must be in place to tell it when a server has failed. The most common approach is to establish a “heartbeat” between the interface and the server. The interface sends a regular message to each of the servers in the system, and each responds with a reply—the server’s heartbeat. This message could be in a variety of formats, such as pinging the server or the synchronizing of client-server times.

If a server fails to reply after a predetermined number of attempts, the front end assumes that the server has failed and that any unhandled requests that have been sent to that server will not be handled. At that point, the interface sends those requests to the other available servers.

How often these messages are sent to the servers and how many missed responses constitute a server failure should be defined by the needs of the individual application. For instance, a telephony server, where the client wants to establish some sort of telephone connection, might require a response within seconds to prevent the client from disconnecting. On the other hand, an e-mail server might require a response only every few minutes. Once the front end has accepted the e-mail, the end user is unlikely to notice a few minutes of delay while the system determines that a server has failed and fails over to another server.

The “heartbeat scheme” outlined above might not be an architect’s first choice because it requires implementation of a process that is asynchronous with client requests. A similar approach, where the server simply acknowledges that it has received and is handling the client request, may also be used. However, the primary drawback of this approach is that one missed acknowledgement by the server causes another request to be initiated. In some cases, this redundancy won’t cause any trouble. But in other cases, such as adding a duplicate record to an invoice database, it quickly becomes problematic.

Managing the state of the requests
As mentioned previously, to decouple the interface and the server, you must make sure that the interface retains the state of the data for all requests being handled. If the state of the data is preserved, requests can be resent to other available machines when a single server fails. Thus, the nature of the request that was made, along with any parameterized data that was passed from the client, must be maintained until the server responds to the request.

Setting up a request queue can assist in maintaining pertinent state data. Client requests are received by the interface from the client and placed into the queue. As the requests are assigned to their application servers, they are flagged as “in process.” The queue record tracks the server that is handling the request. Once the server has handled the client request, the record is removed from the request queue. If at any time a server fails, the queue will be reprocessed, and consequently, all requests assigned to the failed server will be reassigned.

Wrapping it up
Decoupling the front-end interface from the server, balancing the load requests, and protecting the state of the data provide the foundation for implementing a robust and reliable failover system. Future articles will drill down on designing message queues and other methods for testing server availability.