Disaster Recovery

Not so gentle reminder: Business continuity lessons from a northeaster

Bob Eisenhardt compiled some questions for SMBs to ask themselves as they implement business continuity planning. The recent power outages in the northeast bring them into focus.

When will power return to one of my client locations that holds three medical offices? For the past two days this has been a worry, and short of driving 40 minutes and staring into darkness, I have little to do at these locations. When it does return, I may have PLENTY to do, so pending that, I thought I would review some business continuity planning (BCP) questions I have compiled for small business organizations to consider -- and relate them to the recent Nor'easter that wrecked the East Coast, but not Halloween!

Question 1: Who can declare a "disaster" or a service interruption? This should be easy - the local business owner can do that, but there must be a chain of command here. If the primary owner is on vacation in China, that makes communication difficult. Two or three people must be able to close the business and implement plans as circumstances dictate. (Real world: text messages do not suffice; they can be ignored. Phone chain works best). Question 2: If your employees cannot report to your location, how long can you afford to wait to re-establish business levels? 24 - 48 - 72 hours? While the medical complex is now closed for two days, medical offices have a standing supply of incoming patients so the first two days will be madness. Pure retail operations, however, do not have a line at the door! They can suffer irreparable loss and may have to close their business. Evaluations must be made of such a circumstance. One week? It can happen. Question 3: Who among your staff knows the most about implementation of emergency procedures? The owner should delegate this to one or two people below them in the hierarchy to co-cover the situation and share responsibilities. These procedures should be WRITTEN DOWN and distributed to staff as part of the employee handbook. Question 4: Do your employees know what to do if they cannot get to work? This begs the question of a co-location. In my medical clients, one is a large optical house and I also support a competitive practice about a mile away. Bluntly, they HATE each other BUT in the event of ONE closing for a significant period of time AND pending PATIENT emergencies, I can put one part of one into the other for a short time. This option requires courage and guts. But for the emergency care patient, it is absolutely VITAL to implement. Question 5: Can you process payroll and financial transactions from your recovery site? As their IT department, my resources can be quickly set up for basic accounting and patient access with a 24 hour window to uptime. It's been tested and verified over the years. I know how their networks are built and how the software runs.

How it turned out this time

It was at this point in writing that I took a break, and at 6:51 p.mm Tuesday night power came back on. I was able to RDP into a server. I immediately called and notified my clients by phone and email that power was restored. Then, after a little sleep and early in the morning, I was on the road with coffee to see my client's offices BEFORE staff arrived and mitigate any issues as they arose. There were, thankfully, only a few.

There were no server failures --very thankful there. No workstation failures either, save one critical station that managed only 31,000 odd retina scans; it suffered an OS crash. This was a Windows OS failure and this particular computer, being HIPAA certified, was one I never touched because I was actually forbidden to so. After the usual diagnostics, the vendor was contacted and a replacement system should be arriving tomorrow morning. I then provided the client with the storage path and login information to the server as all 31,000 odd images are server managed. Their technician will need this data ASAP.

There was one printing failure, easily corrected. At the second office, a very inexpensive 5-port switch died and was easily replaced during the day. The third office suffered no problems at all, being just two Windows stations connected via wireless router.

Throughout this, my business continuity planning and familiarity with the office environment ensured that I could manage almost any contingency that arose. DR procedures could be put into place within 24 hours as I also have a plethora of desktop systems available for immediate placement as needed.

A positive upshot of this was that my largest client is now convinced to proceed to a gigabit network, with an estimate presented and to be implemented during the next two weeks.

Summary: Disaster recovery was not an issue in this circumstance and business continuity protocols were firmly enough in place that even the retina image failure is being managed in a timely manner. Oddly enough, the primary vendor of this product is without power as of this writing!

Have you already tested your BCP measures after this early season storm?

5 comments
cpr
cpr

Who can declare a ???disaster????: There should be 1 command group (person) authorized to declare an emergency. Just because an incident occurs, doesn't mean a disaster should be invoked. Information should be evaluated/checked and when applicable, declare an emergency. How long can you afford to wait? : Part of any disaster recovery plan should state what should occur if an emergency lasts certain time periods. If it looks like a short emergency duration, then do this ... If it looks like it will last longer, then follow plan B, etc. Communication is very important. A single authority (group/person) should be charged with communicating with employees, telling them where to report, what they need to do, etc. Information should also be available for the media, clients, suppliers, bankers, etc. if the outage persists. There should be a single authority issuing 'official' statements - this lends credence that the people who are running the show know what they are doing. This information is provided by the central command group/person. Who knows how to implementation your emergency procedures?: A disaster recovery manual should be available, and up to date. We know this usually won't happen. At least, there should be an outline describing what steps to take for various emergency scenarios (power outages, evacuations, off-site locations, etc.). Do your employees know what to do if they cannot get to work?: This is where your colleagues/competitors can help. You should put your customers first (instead of your current situation). Try and setup a situation where your colleagues/competitors will help you service your customers. There may be an inherent risk, but your customers will appreciate this. Can you process payroll and financial transactions? The key is to keep money flowing into your business, and pay your bills when you have recovered sufficiently. Your first recovered application should be the accounts receivable system, notify your suppliers of your issues (and indicate when they should receive payments). You can pay your employees just by re-running the last payroll again, then manually adjust any changes when the situation has settled down a bit.

reisen55
reisen55

Because of issues with backups, I generally have my networks automatically shutdown in the late evening hours and reboot, per bios and Autologon utility, in the early morning. A big advantage is that shutdowns are common and the Virus scans and updates are self managed OFF HOURS when employees are not working. I have long advocated a UPS on each computer - a good idea albeit an expensive one - but a single workstation failure is not critical. The Optos failure is more significant so I shall argue for a UPS on this single station. Server rebuild was done for one account earlier in the summer, wherein a GHOST image was employed and a replacement drive to COMPLETELY bring back a dead drive in 3 hours, from start to Active Directory finish. A good job that.

CharlieSpencer
CharlieSpencer

"There were no server failures very thankful there. No workstation failures either, save one critical station that managed only 31,000 odd retina scans; it suffered an OS crash." It's always a good idea to shut down as much equipment as possible if you can see the outage coming. While it may not be feasible for a medical or other vital facility, there are plenty of cases where it's practical. Often people just forget to turn off something they're used to leaving on all the time. Here in the southeastern US, hurricanes give us plenty of opportunities to test our continuity plans. Oh, and I've got a couple extra GX620 towers, just in case you need one :D

reisen55
reisen55

I am now writing from one of the three offices where I was able to backup the most recent set of retina images off of the failed Optos station as noted above - always carry BART PE with you wherever you go. Since this system was also declared dead, I felt some freedon with it, ran Partition Magic and resized the partitions. LO AND BEHOLD it came to life again, so I was now able to properly archive the weeks set of images. The primary event of failure is really a Thermal failure on the processor coupled with Windows boot. The latter is resolved and thermal events are no longer being reported, system is an Optiplex GX620 tower. I also now have confirmation that I can indeed be more intrusive on this system than before, and it is on my clean out list. This is a HIPAA certified system so I was actually under orders to do as little as possible beyond really light monitoring. So it goes. BCP and DR planning are critical functions of the IT department, not often needed but when we do need them ... be thankful you have thought it through. And tested it too.

CharlieSpencer
CharlieSpencer

Is there a reason you copied and pasted the content of the original article into a Reply?

Editor's Picks