The word “triage” is usually associated with
hospitals and battlefields. It refers to a disastrous situation in which there
are many people who need medical attention, but there aren’t enough doctors or
medical supplies to go around. The victims are therefore sorted according to
whose injuries are the most serious or according to whom could get the most
benefit from immediate treatment.
Obviously, this isn’t a situation that you want to find
yourself in. However, triage is also sometimes necessary in information
technology. Imagine that the building that you work in was destroyed by fire,
flood, tornado, runaway bulldozer, bratty kid, or some other destructive force.
If you’re prepared, you have a backup data center that has instantly taken over
and the company is still online. If not, it’s up to you to rebuild the network
in an alternate location.
The problem is that rebuilding a network from scratch is a
time- and resource-intensive process. It’s impossible to just plug in a few computers
and resume operations as though nothing ever happened. Instead, you need to
prioritize the rebuilding process in terms of which systems are the most
critical to the business.
Plan before you have to act
Before I explain how to make that determination, I want to
talk about why this is so important. I own several different businesses and
have had disasters occur on two separate occasions. In one instance, my
business’s e-mail server caught on fire. This meant that I was not able to receive
or answer customer questions, and I was unable to receive notices of the orders
that came in from my Web site.
The other disaster that occurred involved my domain
registration expiring. While this was not a physical disaster, there were problems
with reestablishing the domain name. My site was down for almost a week.
In both instances, my business was either crippled or
completely shut down for a few days. During this time, I was losing a
considerable amount of money each day because customers were unable to
order products from my Web site or I was unable to respond to customer
questions. Since this was a small business with a slim profit margin, a few
more days of being offline could have caused the business to go bankrupt.
Even when everything was repaired and back to normal,
there was long-term damage to the business. The customers who were lost during
that time will never come back. I also lost my position within the search
engine rankings because the search engine spiders could not locate my site. It
took about three months after getting everything back online for business to
return to normal.
My point is that unless you respond quickly and effectively,
a disaster can cause your company to close its doors forever. It is therefore
critically important to make good decisions as to what steps will be the most
effective in getting your company back online.
Prepare the plan
Regardless of the type of company you work for, step number
one should be finding a new place to set up shop. If your building isn’t
physically damaged, you might be able to skip this step. Otherwise, consider
using another property that your company already owns or leases, such as a
warehouse or a branch office. This will save money and time because you won’t
have to search for a new piece of property or waste time signing a lease.
Once you have a location to do business in, have the phone
company reroute your telephone and Internet service to the new location. It
has been my experience that this can be done within a few hours time if the
phone company understands that it’s an emergency situation.
When connectivity has been established, it’s time to start
setting up some servers. This is where things get tricky. If your old servers
have been destroyed, then you won’t have a choice but to buy new hardware.
However, it can take weeks to get a check from the insurance company, so you
will likely be limited to the amount of cash that you have on hand, which
probably won’t be enough money to replace everything.
For example, if you paid $30,000 each for fifteen servers,
then it would cost you $450,000 to replace them all. If you don’t have that
kind of cash, think about which servers are the most critical and determine the
minimum amount of computing power that could be used to provide those
servers with a minimal level of functionality until you can get real servers.
You might decide that five of those fifteen servers are really critical to keeping the business’s doors open, and although the
other ten are important, they don’t necessarily have to be
available today. You might also discover that while those servers run best on
quadruple processor boxes, you can run the critical services on a single
processor box in a pinch (with the obviously decreased performance).
After performing this assessment, you might determine that
rather than spending $450,000 to replace all of your server hardware, you can
spend $15,000 on five high-end PCs and configure them to act as temporary
servers until you can buy replacement hardware.
Getting back online quickly with minimal functionality is
important, but it is equally important to make sure that you bring the
appropriate systems back online first. The million dollar question is: How do
you determine which systems should be brought back online first when everyone
is screaming at you because they think that their systems are the most
Before you can bring anything business-related back online,
you will need to get some infrastructure in place. Therefore, your top priority is to get at least one domain controller, a DNS server, and
possibly a DHCP server back online. Beyond this, the decision making process
isn’t quite so clear cut.
I recommend planning ahead of time and getting upper
management to make you a list of which systems take the highest priority in
times of disaster. If a disaster has already happened though, you won’t have
that luxury and it will be up to you to make that decision.
What constitutes a critical system varies widely from
company to company. However, if you want some general guidelines, I would bring a
mail server online first so that you can communicate with your customers and
employees and let everyone know that you are still in business. After doing so,
I would bring online the systems that produce the most immediate income. By
doing so, you keep the cash flowing and reduce the chances of the business
closing its doors as a result of the disaster.
You can also narrow the decision-making process down a bit
by deciding which systems are relatively unimportant. For example, it has been
my experience that departments such as Human Resources and Marketing will often
scream the loudest about needing to be brought back online, but often, their
needs can be considered secondary to the immediate requirements of the business.
If you do decide to use high end PCs in place of real
servers to quickly bring the most critical systems online with minimal
functionality, I recommend going to a mom-and-pop computer store rather than to
a large retail chain. You will likely get a better price, faster service, and
you will be able to completely determine the specs for the machines that you
are buying. Sure, you can pick up the phone and custom order a machine from
Dell or Gateway, but you won’t get same-day service. On the other hand, a
small, independently owned computer shop will likely jump at the opportunity to
get a $15,000 order and will probably bend over backwards to get you the
hardware that day and to help you any way that they can.
Once you have the replacement hardware, it’s just a matter
of installing operating systems and restoring backups. You might also have to do a little reconfiguring to compensate for differences in disk
structure or hardware capabilities. In addition, I recommend disabling any services
that aren’t absolutely critical so that you can reserve the temporary system’s
limited resources for your most critical applications.
When your most-critical systems are up and running (even if
it’s at a limited capacity), you can begin the process of rebuilding everything
else. This means coordinating efforts with your normal hardware vendor, your
insurance company, and whoever is repairing the damage to the old facility.