We had a major outage at work this week. It will end up costing the company a small fortune, so I hope that the company will learn its lessons from the episode; otherwise it will be a lost opportunity of the worst kind.
Planning for disaster is essential. OK, so if my home laptop gets dropped, it will hurt me a little, but nobody will be out of work, nor will I lose any more than the cost of a replacement. It's a different story when you are talking about a server that is running an application upon which a lot of people rely. If you have workers standing idle, the costs start to mount up. It isn't only their hourly pay that is running to waste, it is also the loss of production, the annoyance to customers, and the sheer frustration suffered all round.
All this needs to be budgeted for. As always, the people with control of the purse strings will have to be convinced that back up and disaster recovery plans are necessary. After all, nobody wants to spend money when there is no intention of using the equipment, yet the act of putting in place a duplicate system will be a minor cost, compared with the amount of trouble a loss of service will cause.
Working out the worst-case scenario is important. In the words of the Dr. Pepper advert, you have to ask yourself, "What's the worst that can happen?"
Basically you have to plan for a total loss — whether by theft, fire, earthquake, or terrorist action — of your system. You have to decide how long you can be without that system before you start to lose an unacceptable amount of service.
So what is the worst that can happen? Well, for us it was the call logging system going down on Monday morning and not coming back up until Thursday afternoon. It was a very stressful week, with work arriving by phone, usually several calls at a time. The office staff were working flat out taking calls, writing down the details, and calling the jobs through to the field engineers. In turn, we had to keeps notes of everything and pray that we didn't need to use the system to order replacement parts or look up any customer information.
Somehow we managed to get through the week without anyone getting killed, but it stretched our customer service skills to the limit. I spent a lot of time apologizing to customers for late arrival, and it didn't help that my nearest colleague was away from work on a training course and I had to cover his area as well as my own. Good news is that this week I have an extra day off for May Day bank holiday, so I will take the extra time to unwind on the beach.
I sincerely hope that this catastrophe doesn't happen again, but if it does, I trust that the application support people will have a better plan in place.