Data Centers

Disaster recovery: Lessons learned from a volcano

Patrick Gray recounts his experience with the volcano aftermath while working in Europe and how disaster preparedness could have eased some problems.

Like thousands of other travelers, I spent several unplanned days in Europe recently due to the ash cloud spewing from the unpronounceable volcano in Iceland. I joked with friends and colleagues that despite several hundred thousand miles of business travel under my belt, if you asked me to write 50 reasons I would be stuck somewhere for several days, I most likely would not even remotely considered including a volcano on that list.

During the 14-hour drive from the shuttered Paris airport to clearer skies over Madrid in one of the last rental cars in Paris, I contemplated what lessons there were for IT and general disaster planning and recovery to be learned from this incident. There were three broad lessons I learned plotting my escape from the ash cloud that have broad applicability to corporate disaster planning.

Lesson 1: Focus on outcomes not scenarios

Too much corporate disaster planning focuses on the scenario that triggers a disaster. What if there's an earthquake? What if two terrorists attack an airplane? Three? What if... It's obviously a time-consuming exercise to contemplate every single disaster scenario, and even the most imaginative group would likely miss some (a volcanic ash cloud from Iceland for instance). Rather than considering all the possible incidents, consider potential outcomes. You don't have to be the most creative person to consider "something" could cause the majority of European airports to be closed, and this can be readily planned for, whether it's earthquakes, plagues or aliens that shutdown the airspace, reactive planning is similar.

Amazingly, it appears no one had ever contemplated a large-scale shutdown of European airspace and days were spent just trying to get the right parties from each impacted country on the phone together. Consider widespread outages of critical corporate infrastructure when planning potential disasters. What if you completely lose external connectivity? How will you react and what parties need to be brought together to determine how to proceed? This is one of the few areas where good, reactive plans that cover failure of 5-10 critical pieces of infrastructure will trump trying to consider every potential contingency.

Lesson 2: Provide empowerment

There was an interesting dichotomy between how I reacted to the airspace closure and how some of the other travelers reacted. I spend more time than I would like in airplanes, so I am probably more savvy than most, and was also travelling on business, with more flexibility in terms of available options and monetary resources. However, the news was filled with people sleeping on cots in inactive airports, at the total mercy of the airlines, who were in turn at the mercy of the whims of nature.

After the second day of flight closures, I booked contingency tickets out of Madrid, which was enough south to be outside the impact of the ash cloud. As I searched for flights I literally saw open seats fill before my eyes as travel agents, airlines and stranded travelers attempted to arrange alternatives out of Europe. Since I was empowered to determine and execute my own contingency plan, I had alternative flights and a rental car on standby the moment US Airways called to cancel my flight out of Paris.

Consider who should be empowered to make decisions to react to a disaster in your organization. This is likely not just one person, but a group of people who may have to act autonomously and without your organization's usual approval and management oversight. If communications are disabled, you don't want a remote sales office dead in the water since no one is allowed to do anything without approval from some entity that they can no longer contact. This group should also be granted special authority as they react to a disaster, since they will likely have to seek creative solutions to whatever problems surface, with little or no outside help or oversight.

Lesson 3: Have a price tag

While I was willing to put up with a long drive and absorb the impact of supply and demand-related price increases in the few modes of travel that were operating, I did have a price and time cap to what I was willing to endure to get out of Europe. I heard rumors of someone that had booked a berth on a merchant ship for a 22-day, multi-stop voyage across the Atlantic, and while I admire taking contingency planning to this level, I'm not sure a bunk on a boat for nearly a month would be worthwhile.

Each piece of your critical infrastructure should be assigned a price tag that you are willing to pay to restore that bit of infrastructure, in terms of time, financial and human resources. If human life or a major revenue stream are at stake, that price tag should obviously be higher than a non-critical service, but without some form of ballpark guidance, delivered in advance, people trying to recover from a disaster may spend too much, or become penny wise and pound foolish in their attempts to recover from a disaster, creating an even larger revenue impact in the long term.

Lesson 4: Figure out who to talk to

One of the flabbergasting outcomes of the airspace closure was what looked like gross incompetence as various European governments and agencies struggled to develop a joint approach to dealing with the disaster. It is easy to make these agencies out as inept with the benefit of hindsight and a seat on the sidelines, and I can only imagine the difficulties of assembling a multi-agency multi-country group in a matter of hours that can develop a joint strategy and implement it locally.

As part of your disaster planning, consider what internal and external entities will need to be contacted and coordinated. Is that remote data center that is a cornerstone of your recovery plan really managed by those nice people that sold you the service? Are there other organizations that you might need to coordinate with to restore key functionality? Is there some benefit to beginning an outreach program now rather than when disaster strikes?

While the Icelandic volcano may have seemed a distant and unlikely event in countries oceans away from Europe, it should serve as a case study for all manner of disaster planning exercises. Planning for outcomes rather than trying to contemplate every "trigger scenario," providing managerial and financial empowerment and getting the right parties together before a disaster strikes are just some of the lessons that the volcano can impart to your organization.


Patrick Gray works for a global Fortune 500 consulting and IT services company and is the author of Breakthrough IT: Supercharging Organizational Value through Technology as well as the companion e-book The Breakthrough CIO's Companion. He has spent ...

Editor's Picks