Business leaders must prepare for disasters made by man or Mother Nature with extensive, practiced recovery plans to avoid system shutdowns.
A Delta ground stop was lifted Monday morning following a 2:30 a.m. ET power outage in Atlanta that delayed and cancelled flights worldwide. Businesses should view this as a cautionary tale, highlighting the importance of quality data center power and disaster control systems.
Delta cancelled approximately 300 flights due to the outage. As of 10:30 a.m. ET, it operated 800 of its nearly 6,000 scheduled flights. However, Delta customers heading to the airport on Monday should still expect delays and cancellations, according to a press release. As inquiries are high and wait times are long, there may also be some lag time in the display of accurate flight status from the airline, it warned.
Last month, Southwest Airlines cancelled 1,150 flights after a system outage. Though the system came back online within the day, hundreds of flights were backlogged.
Based on recent research, it's fair to say that what happened to Delta and Southwest could happen to a number of businesses. Some 57% of small and mid-sized businesses have no recovery plan in the event of a network outage, data loss, or other IT disaster, according to a Symantec study.
"Planning and executing disaster recovery exercises is something that should be done on a regular basis to find out these issues before they may be impactful," said Mark Jaggers, a Gartner data center recovery and continuity analyst. "The issue, which was also the case with Southwest Airlines, is not planning for partial failure scenarios that are harder to get to the root cause of and work around."
To avoid shutdowns like Delta's, company data centers should have redundant power and networking, preferably from a grid and provider, respectively, that are completely independent from the primary ones, Jaggers said.
"Data centers are a huge piece of a disaster recovery plan," said mission-critical facility management professional Christopher Wade. "To have a reliable infrastructure, you have to minimize single points of failure." Business leaders should also ask about the experience levels of data center staff, as many of these companies are currently understaffed, Wade added.
Usually, large companies have a primary data center in one location and an alternate in another that is far enough away so the two do not experience the same disaster at the same time, said Roberta Witty, risk and security management analyst at Gartner.
"In today's world, the business expectation is that you're up and running quickly after a disaster," Witty said. "The 'always on' driver is changing the way organizations deliver IT in general, and so they are building out their data centers to be more resilient."
Faster recovery times
About 60% of organizations are moving to a recovery time objective of four hours or less, Witty said. Doing so successfully involves extensive planning. First, determine what business operations are mission critical. Then, consider factors that impact recovery time requirements, such as revenue loss, safety, and brand reputation, and build your recovery infrastructure accordingly. As more companies outsource data operations, a key consideration should be the third party's ability to meet your recovery requirements, she added.
Crisis management practices, such as the procedures Delta used to notify management and deal with customer fallout, usually get exercised every quarter. "The more you practice your crisis management procedure and communicating with your workforce, customers, suppliers, and partners, the better off you are," Witty said. "A plan that hasn't been exercised is not a workable plan."
Disaster recovery can't be something a company reviews once a year, Witty said, but rather an ongoing part of every new project.
"Your recovery environment has to stay in sync with production, which is where a lot of organizations fail," Witty said. "Build disaster recovery into a project lifestyle—whether it's a new product or a change in management, you have to go back and revisit your recovery plans."
The 3 big takeaways for TechRepublic readers
- Delta experienced a massive networked service stoppage Monday morning after a power outage in Atlanta, which offers a lesson in disaster preparedness and recovery for other businesses and data centers.
- About 57% of small and mid-sized businesses have no recovery plan in the event of a network outage, data loss, or other IT disaster, but these plans are key for mitigating natural and manmade disasters and keeping business operations running smoothly.
- Companies should build crisis management and proper communication into all new projects and management changes to ensure consistency.
- Google uses DeepMind AI to reduce energy use at data centers and save money (TechRepublic)
- How many robots does it take to run your data center? (ZDNet)
- Gogo debuts new 2Ku tech to make in-flight Wi-Fi faster, but challenges remain (TechRepublic)
- Software Defined Data Centers: Best practices (ZDNet)
- Why Cisco's data center strategy depends on simplifying storage (TechRepublic)