Meeting the demand for high availability while balancing costs and IT resources is a major challenge. One solution that can help with some types of servers and applications is APCON PowerSwitch, as this tutorial demonstrates.
APCON PowerSwitch software allows administrators to implement a monitored automatic failover. This can be viewed as an alternative to the software cluster where architecture requirements, budget concerns, and software support may be of concern.
APCON provides physical layer switches for various interfaces, including a SCSI interface that I will describe in this article. APCON is based in Portland, Oregon and offers software and hardware solutions to maximize your system availability. It's worth noting that APCON’s technical support provided clear technical clarification on some of the topics in this article.
What is an automatic failover?
In this case, an automatic failover is a system implementation in which a "monitoring" system monitors at least one primary system for a system failure. The monitoring system is configured identically so that it can access external storage—usually a drive cabinet connected to an array controller’s external port—and boot up as that instance of your "monitored" system.
An APCON automated failover can be implemented with many architecture types. This functionality is most clearly illustrated in a simple example, which I'll refer to throughout this article. I've deployed this type of configuration in my professional experience where we want a "warm" automatic backup for a mission-critical system. "Warm" means that the backup system is current (database contents in particular), but takes a moment to come online (as opposed to a "hot" backup, which is immediate).
My basic example features the following:
- Two identical servers running a database or another mission-critical application
- An external hard drive cabinet
- Hardware support for multiple boot device sequencing
- Another instance of Windows 2000 running APCON PowerSwitch
- The APCON ACI 2102 SCSI Electronic Crosspoint Switch
The servers in this example are HP ProLiant DL 380 G3 machines. This model comes standard with a Smart Array 5i controller, and our configuration adds a Smart Array 642 controller to connect to the external drive cabinet containing the RAID array. The external drive cabinet houses the drive array and can be controlled, via the switch, by either of the two DL 380 servers. Each system has an internal hard drive system (controlled by the Smart Array 5i controller), which is configured to run Windows 2000 Professional and the APCON PowerSwitch software. Figure A illustrates this architecture.
One note on the configuration above: Both of the DL 380 machines can be the monitoring or the monitored server. Later, when I explain the licensing, this will be a factor. An automated failover solution such as this does not require that every computer have an internal and an external drive. Just keep in mind that if a system does not have an internal drive, it would not be able to be the monitoring system, but only the monitored system.
The nuts and bolts
The APCON PowerSwitch software will reside on the monitoring machine, and its corresponding Windows PowerSwitch service will be on the mission-critical system. This service on the mission-critical system will retrieve basic configuration data from the monitoring system. This data is made available to the mission-critical system by a shared file folder. It is important to note that these two systems—the mission-critical system and the monitoring system—need to have exactly the same sets of administrative credentials. This configuration file residing on the monitoring system is retrieved by the monitored system at its startup. At that point, the automatic failover can be enabled on the monitoring system. Figure B is from a monitoring system with a failover enabled to the monitored system.
As a matter of practice, it's important to distinguish the monitoring system from the mission-critical application (or any other system). Accessing the computer through a KVM, terminal service, or other means can easily lead the administrator to confusion. In my automatic failover implementations, I have set the desktop background of the monitoring system to a bright orange color and left the default Windows background on the mission-critical system.
When the failover is enabled, the monitoring system is constantly interacting with the monitored system over the Ethernet network. This communication is not monitoring the SCSI bus, but is configurable to monitor a network ping or a series of disk writes to determine if the monitored system is online. The system is configurable for the criteria that can warrant a system failure. These configuration topics can allow you to give consideration to the needs of your implementation, since you do not want to have a false-positive failure and start an automatic failover without an actual need. Figure C shows some of the configuration options within the PowerSwitch Administrator software.
On the DL 380 G3, a frequent misconfiguration occurs with the boot controller order. The boot controller order will specify that the first boot device of the hard drive type will be the array controller in slot X for the Smart Array 642, and the second hard drive boot device will be the Smart Array 5i controller. In this configuration, the "monitored" system will be able to boot up with the mission-critical system when the BIOS initializes the array controller and the logical drives are initialized. Because the Smart Array 642 controller is first in the boot order, it can boot through that system.
The monitoring system has the same configuration. Because the APCON switch will be set to allow the monitored system to boot up with access to the external drive array, the monitoring system will initialize the Smart Array 642 controller, but will return 0 logical drives. The system would then proceed to initialize the integrated Smart Array 5i controller, and return the internal drive configuration and boot off of that instance of the monitoring system.
Transactional software concerns and licensing
Consideration needs to be given to the scenarios that can initiate an automatic failover. In Windows 2000, the following are frequent causes:
- Windows Blue Screen of Death (BSOD)
- Inadvertent shutdown
- Network interfaces failed on the mission-critical system
When such events occur, the automatic failover will be initiated and the next bootable system will pick up where the "failed" system left off. If you have a server-side database or transactional e-mail system, the system may be adversely affected from the equivalent of a hard shut down. If the system comes up and recovers from the transactions without incident, then the only noticeable problem would be a few minutes of unavailability.
Licensing an automatic failover in a Microsoft Windows environment as I described in this article would include one instance of Windows 2000 with all of the server applications, two instances of Windows 2000 Professional to be used to monitor the server systems, and the PowerSwitch software. Other third-party applications that reside on each instance of the operating systems would need to be licensed as well.
How is this different from a cluster?
Cluster installations offer a different architecture than an automatic failover. Specifically, clustering provides for a transparent failure of a contributing system. The automatic failure architecture revolving around an APCON PowerSwitch solution can recover from a hardware failure in minutes, cost less than a cluster, and has less staff competency requirements in comparison to cluster-aware hardware and software.
An automatic failover can actually be implemented in an architecture that will be more cost-effective than a traditional cluster configuration, as well. For example, the APCON ACI 2028 6X4 SCSI Electronic Crosspoint Switch is a flexible device that can offer many configuration options. Consider the following: If you have four mission-critical systems and would consider an automatic failover for these systems, you would only require one monitoring system (saving in hardware and software costs). This system can be configured to monitor the four mission-critical systems, and the single monitoring system is able to come online as any one of those four systems in the event of a detected failure. This would require a total of five systems of the same configuration with four external storage units available to the failover process. While this example is more complex than the scenario described above, it is an option to meet the growing pressure to balance costs, performance, uptime, and resources.
If you need to implement a high availability solution for mission-critical systems, but lack some of the IT personnel resources to manage it, or if you want to avoid some of the additional licensing costs that often accompany such solutions, then you should consider an APCON automatic failover solution. With a little lab time and some supported hardware, you can get a good understanding of the solution as an alternative to a clustered environment, and you can potentially save money when comparing the APCON solution to other high availability options.