Are there too many availability options for today's network workloads?

There are a number of ways to accomplish any task in IT. In the case of ensuring a workload is available, there are plenty of options. IT pro Rick Vanover asks if we are making too much work for ourselves.

Following last week’s blog post on determining whether or not every server should be made virtual, a similar argument can be made regarding all of the availability solutions available for today’s systems.

Historically, I’ve found the best availability for mission-critical applications to come from a mix of the right tools. This could be utilizing DNS CNAME records for database moves or failovers, when used in conjunction with database clustering and log shipping or replication. Databases and DNS entries are among the more critical components of an application, but there are plenty of other solutions to deliver availability to workloads.

Additional options exist for virtual machines, where hardware can be abstracted from the operating systems to provide additional availability. One feature, VMware HA, provides a number of options to ensure virtual machines are kept available. Should an issue arise, the vCenter Server will restart the virtual machine on another ESX(i) host. Additional features with VMware HA include being able to monitor guest virtual machines for conditions that require a guest reset, such as a Windows blue screen of death (BSOD) event. Another key availability feature is a VMware fault tolerant (FT) machine. The FT virtual machine adds additional availability in that network; processor and memory resources are run simultaneously on two ESX(i) hosts. The storage remains single instance in this configuration, but storage systems usually build in additional layers of availability. This includes multipath configuration, RAID technologies, storage replication and more.

Within the operating system, there are plenty of options that can be leveraged on physical systems as well as virtual machines. These solutions include the DNS and database tricks mentioned earlier, as well as setting up Windows Cluster Services for Windows Servers, which can add availability within the operating system, including options that are application-aware.

All of these technologies are good, in fact, great for the ability for IT professionals like you and me to deliver robust solutions. But, the question becomes how many levels of availability do we really need? The issue is when it comes to troubleshooting. Each of these availability solutions has their own opportunity to provide troubleshooting experiences. Further, they may add additional levels of complexity that may simply cause administrators extra work.

The issue is that many of these availability solutions may be combined with workloads that may only need one or two of these options. The right way to address what availability solutions are provided to a workload is to have its business requirements clearly identified and have a policy in place that documents what specific solutions are applied to workloads that require enhanced availability.

How do you manage this moving target of increasing functionality yet keep the administrative complexity down? Share your comments below.