Few enterprises have defined disaster recovery (DR) plans for big data, but as big data plays a more mission-critical role in companies, DR should become a focal point.
Not everyone agrees, of course.
When Lockwood Lyon, a systems and database performance specialist, asked database administrators how they implemented disaster recovery in their big data environments, he said there were two common responses:
- that big data is for analytics, not mission-critical data, so DR plans are not necessary; and
- that big data is too big for backup and DR because of the amount of space and recovery time required to accommodate large data sets.
This is a point of view that can last only as long as big data is not considered mission-critical in terms of continuous availability. But with more organizations planning to run big data analytics in dynamic, online sales environments that are dependent upon analytics to respond to changing consumer behaviors (or to run transit systems based upon the ever-changing status of the system), and as companies become operationally and strategically dependent on analytics for business outcomes, it is only a matter of time before IT starts getting asked about its plans to back up, restore, and recover from a disastrous big data outage.
You can't apply the same set of DR best practices to big data that you use for traditional systems of record. This is why IT departments that develop a big data DR plan should consider these key points when formulating their plans.
1: How quickly will you need to restore big data?
Big data required for near real-time analytics should be stored so that rapid recovery can be facilitated. A cloud-based data storage option might be the answer, or onsite rapid storage options such as ongoing replication of in-memory storage on multiple servers. For big data that is needed but not instantly, it might be feasible to store this data on slower media (possibly even tape), although recovery times are longer.
2: Which data should you recover?
Big data is massive — you don't need to target all of it for rapid disaster recovery. Meetings should be held with end business units to gain a consensus on which data is to be recovered in a DR effort. Data retention plans should be in place so that extraneous data is not needlessly stockpiled, which will only extend data recovery times.
3: Does the big data DR plan meet your regulatory requirements?
Some industry regulators require companies to retain their big data for a significant numbers of years, while others don't. Some companies, especially if they depend on long-term trending information, opt to store big data for decades, while others don't see this information as critical. Your big data DR plan needs to meet the business and regulatory requirements of your company.
4: What is your big data recovery point?
With transactional data, the DR recovery point is the point of recovery closest to where the interruption of service occurred. To a degree, this is also true with big data, but there is one additional consideration: You need to determine in what "form" you want to recover your big data. In other words, will you be recovering "raw" big data that appears just as it is when it first enters your systems? Or will your strategy be to recover big data that has already been ETL'ed (extracted, transformed, and loaded) into a refined form of big data that analytics can actually be performed on? Most companies opt for the latter.
5: Update your vendor contact lists, and practice your DR
Big data DR plans need to be coordinated and tested with end user departments and vendors just as you do with transactional DR plans. Although end business users don't always readily cooperate in plan preparation and testing efforts, they will understand. It is getting the big data vendors to the DR table that might be most difficult because, thus far, their enterprises clients haven't expected much.
Do you have a big data DR plan in place? If so, what key considerations to creating a DR plan would you add to our list? What bumps, if any, did you encounter when formulating and/or testing the plan? Let us know in the discussion.
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.