By Mike Talon

When crafting your organization’s disaster
recovery plan, one of the first aspects that you must decide is the
recovery time objective (RTO), the measurement of how much
time can pass between the failure of an application and the time when it becomes available to end users again
after recovery.

I want to point out two things to keep in mind.
First, you need to base these time measurements on an
application-by-application basis, not on a server-by-server basis.
Second, the clock starts when the application goes offline, and it
doesn’t restart until end users can effectively use the system
again.

Properly determining the RTO for any individual
application might seem like a simple task–just ask the appropriate
users of the data system how long they can conceivably live without
the application. But in reality, this can be more of an art than a
science, especially since end users of even the most mundane system
will typically declare they can’t go without it for more than a few
seconds.

Often, getting users to agree to a larger
margin of flexibility requires explaining that it would cost
roughly the debt of a small nation to ensure such availability.
Once you manage to get end users to decide on a reasonable estimate
of how long they can go without their applications, you must
combine the individual time estimates for each set of applications
on a single server.

In the Windows world, where many times there’s
only one large application on a server, this may be easy. However,
in the Linux and UNIX areas (including Solaris), there could be
dozens of applications on each server.

You must first determine if any application
takes precedence over others on the same machine. If that’s the
case, use that application’s RTO since it’s the most important and
will no doubt have the smallest RTO number.

If that isn’t the case, you must determine the
best mix between importance and recovery time, and you need to
select an RTO that meets that analysis. If your budget allows, I
recommend going with the shortest RTO on the machine. But if your
budget is tight, consider using the mean time between all
estimates.

Once you’ve determined the RTO, you can combine
it with other factors such as recovery point objectives (RPOs) to
determine which hardware and software solutions offer the closest
match for your company’s needs within its budget. This should give
you adequate protection, budget withstanding.

If not, you’ll at least have a paper trail that
clearly shows what you needed to protect the organization and that
the lack of budget is why it failed. Collecting and analyzing your
RTO numbers is a great first step in establishing a disaster
recovery plan–and it’s one that’s absolutely necessary.

Mike Talon is an IT consultant and freelance journalist who has worked for both traditional businesses and dot-com startups.