Disaster Recovery

20 things IT must do before, during and after a disaster

A panel of IT professionals outlines 20 things every IT department should do to prepare for, survive and recover from a disaster.

In April 2011, I appeared as a guest on the ZDNet Webcast, "20 Ways to Prepare, Survive and Recover from an IT Disaster."

During the live, one-hour show, ZDNet blogger David Gewirtz, TechRepublic blogger Scott Lowe, TechRepublic Editor-in-chief Jason Hiner, and I discussed the critical importance of preparing for and responding to a disaster--natural or man-made.

Here's the complete list of 20 things the panel discussed:

  1. Planning is key
  2. Design disaster plans with "APIs"
  3. Backup, backup, redundant backup
  4. Set up alternate sites and backup HQs
  5. Understand your entire supply chain
  6. Choose vendors with rock-solid DR plans
  7. There will be lawyers
  8. There will also be germs
  9. Define your chain(s) of command
  10. Monitor the implementation of your plan
  11. Test your plan with disaster drills
  12. Operate DNS servers in multiple locations
  13. Create a database of important master lists
  14. Learn to ruthlessly prioritize
  15. Put "the ship" first
  16. Plan for people as well as servers
  17. Develop and pack "bolts" kits
  18. Train and educate everyone
  19. Build for robustness and failure
  20. Be flexible

Every IT professional should watch this webcast. You can view the recorded Webcast through ZDNet's download directory or by clicking the link above.

Note: This Webcast was sponsored by HP and Intel. You must be a registered TechRepublic or ZDNet member to view the Webcast and the registration information you provide will be shared with the sponsors.

About

Bill Detwiler is Managing Editor of TechRepublic and Tech Pro Research and the host of Cracking Open, CNET and TechRepublic's popular online show. Prior to joining TechRepublic in 2000, Bill was an IT manager, database administrator, and desktop supp...

21 comments
pdavis7
pdavis7

This is all well and good but unless top management wants to put in the resources, ie money, nothing is going to get done. I have asked for years to implement some level of DR besides backups. Off-Site DR location? Only in my wildest dreams. Spare equipment? We don't have enough money to get enough equipment for production. I wrote up an IT DR proposal several years back but I know it is just to CMA in case there really is a disaster as top management would never fund it.

reisen55
reisen55

The value of testing. A server in a medical office collapsed Sunday morning (as I warned staff it would). Bad hard drive. Because I have tested server reconstruct protocols in the past, I was able to reconstruct the server and have it on-line and UP in 3 hours, true!!!

fhrivers
fhrivers

I don't care how good your DR plan is on paper, it's just fantasy and theory until you actually test it out. My company currently thinks it's a good idea to make another branch 12 miles away the "DR Site" even though that branch is on a peninsula. Also, their idea of a disaster is a hurricane, fire, flood or tornado for which they calculated a 24 hour RTO. Well, that's fine, but what about a fried HDD controller or motherboard? Will the 24 hour RTO still be respected? That's why I've been pushing for a multi-tiered DR plan one that covers both hardware "disasters" and natural disasters with the appropriate RTO's for each situation and disaster drills. Like fire drills, I know it will piss people off to be inconvenienced for a little bit, and it's hard to get humans to care about something until it actually happens (and it's too late), but I would be derelict in my duty if I did otherwise.

Ron_007
Ron_007

I thought there would be some substance to the story. Publishing a bare list with no supporting explanations is ... disappointing. Telling me I have to listen to an hour or so of webinar so I can write your story is ... How are we supposed to know what you mean when you make points like: Design disaster plans with ???APIs??? There will also be germs Put ???the ship??? first Develop and pack ???blots??? kits I'll take issue with only one point: "Choose vendors with rock-solid DR plans" Other than JIT operations, how many shops do you know of where IT can insist that the purchasing department include "rock-solid DR plans" in vendor requirements? It sounds good on paper, I just don't see it applying in most real-world situations I've worked in. Price, price, price (and maybe a little quality).

jsbattista
jsbattista

Load the servers into your truck and GTFO.

sfenner
sfenner

Since I haven't yet seen the webcast, this post may be out of turn. However, I think you've forgotten the most important piece. Make sure you're recovery plans match the organization's needs. If your business hasn't done a risk assessment and business impact analysis to provide the recovery requirements, then you don't know if you are recovering the most critical systems. IT recovers the systems, but the business recovers the business. Make sure they are on-board and help them be prepared. Lastly, life safety is always first. It's good that Aon recognized that in 2001. If your people and their families are not in a safe place, then you have to help them get to that situation first. Otherwise, they are not going to be available to you. IT Disaster Recovery is only a very small part of a good business continuity management program. If your DR program is not part of a BCM program, no matter how good it is, you're not prepared.

jfuller05
jfuller05

Our data is backed up every night on a network drive, backed up to our vendor, and then backed up in three different locations across the U.S. via our vendor. So, I think we're good in the backup, backup, redundant backup step of the disaster recovery plan.

AnsuGisalas
AnsuGisalas

It mixes things from every level: practical, tactical, theoretical, strategic, value... Many of them can be compressed into the First rule for Management: "Understand your job". That one covers most of the non-practice items on the list. And "planning is key" is definitely subordinate to it, how would it be possible to plan without understanding the alpha and omega of the organization?

reisen55
reisen55

I was the system administrator for Aon Consulting, 101st floor of the South Tower. My servers crashed 103 floors to the ground, a non-recoverable incident!! As Aon has a global network, the IT group (after shock effect) was rebuilding within 48 hours, putting together the New York leg of the network in a variety of locations. I was one of the very few who actually HAD a laptop that performed a wide variety of roles. Our backup tapes for our division of Aon went to Iron Mountain on September 10, 2001. Lucky indeed. Aon put people first (they have forgotten that rule of late). Just knowing who survived and did not was enough for one day. Eventually, over 1,200 systems and monitors were delivered to 685 Third Avenue, temporary location, where GHOST performed wonders under a pressure environment. Be prepared to WORK!!! Then, go home. You've done your day and after 9-11-01, going home is very good indeed.

oldbaritone
oldbaritone

It's always tough to justify additional costs, right up until the disaster occurs. Fortunately most large companies can perform off-site backups these days via their intranet, and smaller companies can utilize one of the many online storage companies for off-site backups. OTOH, when disaster on the scale of the Japanese Tsunami strikes, and the entire city is leveled, it's going to be a while before most of the IT departments recover, unless they are ready/willing/able to relocate immediately to a suitable unaffected location.

Bill Detwiler
Bill Detwiler

In April 2011, I appeared as a guest on the ZDNet Webcast, ???20 Ways to Prepare, Survive and Recover from an IT Disaster.??? During the live, one-hour show, ZDNet blogger David Gewirtz, TechRepublic blogger Scott Lowe, TechRepublic Editor-in-chief Jason Hiner, and I discussed the critical importance of preparing for and responding to a disaster???natural or man-made. How would you rate your IT department's level of disaster readiness? Take the poll and let me know: http://www.techrepublic.com/blog/itdojo/20-things-it-must-do-before-during-and-after-a-disaster/2500

Bill Detwiler
Bill Detwiler

You're 100 percent correct. Testing and training are critical parts of the disaster preparedness process. And after the tests, you should re-evaluate your plan and make necessary improvements. I encourage you to watch the Webcast. The other panelists and I stress this very point.

Bill Detwiler
Bill Detwiler

This post was designed to highlight the content of a recent, hour-long Webcast. I encourage you to watch the Webcast, where the other panelists and I cover each of these 20 bullet points in greater detail.

SgtPappy
SgtPappy

a really huge battery backup?

maclovin
maclovin

Couldn't have said it better.

jashelby
jashelby

OK, so you have triplicate backup copies. Have you demonstrated a reliable capacity to restore functionality from each one of those stored backups? If not, then your backups are quite possibly just useless wastes of space. Preparation is great, but practice is better.

AnsuGisalas
AnsuGisalas

Planning for the inconceivable is pretty difficult, but are there any things/lists/data you'd have wanted to have, after the fact?

SgtPappy
SgtPappy

is part of disaster recovery.

reisen55
reisen55

We led the way. I would stress pre-planning and also testing-verification as possible. The latter is awful if you are a full data center, but once a year simulation is a vital test of the systems end. IBM Sterling Forest, NY offers such testing and my former manager at Aon has been there for such simulations. Document everything and revise as needed. PrePlanning is difficult as each individual firm and site location is unique. Flood zone? Fires nearby? All must be taken into account as risk factors. Weighed accordingly. Do not forget PEOPLE as they are the most important commodity. If you need data tapes but getting OUT is life-saving, save the life. I lost a colleague who went back for his set of data tapes that day. True. Prepare GO-BAGS, leave the building sachet with everything you need for the day INCLUSIVE OF CASH. Prepare your rebuild scenario too, a RETURN TO NORMAL plan.

Editor's Picks