How well can your organization deal with an emergency? Automatically sign up for our free Disaster Recovery newsletter, delivered each Tuesday, and make sure you’re prepared for the next catastrophe.
In disaster recovery planning, many
organizations include plans to restore data from tape and other
point-in-time copies to new hardware. Likewise, many larger firms’
DR plans include preparations to fail over operations to another
data center if the primary facility becomes inoperable.
However, very few organizations actively
prepare to handle the loss of a single critical system–one that
has a recovery time objective (RTO) of less than two hours–when
the remainder of the production facility continues to function. DR
plans that fail to address limited but critical disasters such as
this run the risk of letting down the organization when such an
outage is often avoidable.
As with most aspects of DR planning, there’s
more than one way to handle this type of outage. The majority of
possible solutions falls into one of two categories: local High
Availability (HA) or off-site Remote Availability (RA) systems.
By design, HA solutions allow one server or
system to stand in for another almost immediately. The timeframe is
typically within a few minutes or so of recognition of the
outage.
These systems offer much faster recovery
times–but at the cost of flexibility. HA systems almost always
refer to failing over a system to the same physical location, which
is necessary to preserve IP subnet and other settings required for
immediate failover.
You can configure some applications for
many-to-one failover locally, and a multitude of clustering
solutions exist that can also leverage failover in the same
physical site. This allows you to stay within your budget while
offering protection against limited-scale disasters.
RA solutions offer the same type of recovery,
but these solutions generally refer to systems that allow failover
to another physical location. Since this usually also means
different networks and subnets, you won’t be able to fail over
every application within a two-hour RTO using RA systems, but you
can protect the majority of technology solutions.
This provides failover options for both
single-system failures and data-center-wide disasters, limiting the
amount of money you’ll need to spend for protection. However, keep
in mind that end users will have to access data over slower WAN
links, and they may need to reconfigure client-side applications in
the event of a failure, even if the remainder of your production
facility is still functioning.
Restoring critical systems when a complete
failure hasn’t occurred is a balancing act. You must recover many
of these systems within a small RTO that usually won’t allow for a
tape restore to new hardware.
But protecting each system both locally and
remotely may prove too expensive for your budget. Remember that you
can phase in these systems over time, beginning with the most
critical data systems and working outward.