Almost everything you hear these days is about rapid and actionable access to Big Data. Leading the charge is enterprise desire to acquire and act on near real-time data from the present.
But there is also the historical side to Big Data: the need for individuals throughout the enterprise to peruse older data in order to determine trends, or to more fully understand a present business problem and the causative factors that have contributed to it over time so it can be alleviated.
In this context, historical data also requires immediate access-without IT having to undertake a massive data restore. Strategically, this means that sites should be planning for ready and painless user access (without IT intercession) to relevant historical information, as well as split second access to data that has been newly created.
This more “mission-critical” attitude to historical data does not come naturally to IT. For years, the approach for data archiving has been one geared toward the saving of data for regulatory and backup purposes. Backups were made to slower (and cheaper) disk, or even to tape systems. Occasionally this data was needed by regulators, but that was about it. In the process of data restoration, IT even discovered that some of the tapes this data was stored on had degraded to the point where they were unreadable.
It is this “backup ” mentality for archiving that is now carrying over to big data-and that is potentially limiting the end-to-end value that enterprises can derive from all of the big data they are now collecting and mining.
Here are two use cases:
- A financial services marketing team sees that buying patterns in a particular population segment are changing-and it wants to know why. It clearly can see the new changes coming in from near real-time analytics that are executed on big data and standard data-but what it really wants to understand is the causative factors driving the change, and when these factors began to affect buying. If marketing can find this out, it will be in a better position to offer products that are relevant to buyers—and to proact to other segments of its customers where the same buying pattern could potentially emerge.
- A regional hospital with a large territorial footprint notes that levels of colon cancer in patients in a particular geographical zone are higher than they are for patients elsewhere in the region. The hospital wants to look into causative factors that could be environmentally related, or perhaps tied to other characteristics in the area. It needs fast access to historical data over the past ten years for research
Uncovering causative factors in scenarios like this requires easy and flexible access to big data that could also be historical on a daily basis. This is the potentially mission-critical task that can also be missed if IT focuses solely on disaster recovery and backup for its older big data-and not on multi-purpose archiving.
So what are some steps that IT can take to ensure that its big data storage strategy is broad enough to meet the full range of business information access needs?
Take a broader view of mission-critical work that could require big data. Big data analytics is great for meeting instantaneous “need to know” needs-but there are is also highly important research on historical data that depends on immediate data access.
Review data archiving strategies with the end business. Should you automatically store off big data from tier one storage to cheaper disk after 30 or 60 days of non-use? Or do you do something different? Whatever your policies, they should be reviewed annually with end business decision-makers.
Ensure the quality of older storage media. Tape in particular can go bad. It is important to monitor the humidity and temperature of your archive area, and equally important to regularly check tapes for evidence of degradation so you can remove them from service before the data they store becomes unreadable.