Form a long-term storage strategy: Doing the big data grunt work

Unorganized data of both the structured and unstructured varieties bulges in storage. How do you figure out what stays and what goes?

As organizations craft their big data strategies and launch projects, an undertow of unorganized data of both the structured and unstructured varieties bulges in storage. Data managers are aware of it and cringe. Users ignore it and move on. But sooner or later, the data stockpile has to get dealt with in a comprehensive big data plan that eventually must determine which data is worth keeping under management, and which should simply be thrown away.

Stay or go

In some cases, the decisions on which data to keep and which to discard are being made for companies. Industry regulators place demands on enterprises to retain data for certain periods of time, as do corporate lawyers concerned about e-discovery and the need to produce email and other historical records for litigation. In other cases, age-old data retention policies for different corporate systems keep chugging along, together with their individual policies on data retention.

Beyond these, however, many organizations still face a bottomless data chasm that is a challenge to excavate.

How do you sift through cob-webbed data repositories that may or may not have value, and also create strategies for de-duplicating and sanitizing this data? And can you ever know for sure when the corporate C-suite will demand long-term trending information that depends on historical data gathered over several decades?

Behind this decision making is an almost instinctive drive to throw all of this old data away, at the same time that there is inherent fear that the data may somehow, someday be needed.

How do organizations get on top of this dilemma?

Research and analytics firm McKinsey talks about the need for a "plan for assembling and integrating data that's frequently horizontally siloed across business units or vertically by function." This data might exist in numerous internal legacy systems, or in combination with new, unstructured data coming in from social media, machines, and other Web sources.

McKinsey comments:

"Making this information a useful and long-lived asset will often require a large investment in new data capabilities. Plans may highlight a need for the massive reorganization of data architectures over time, sifting through tangled repositories, and implementing data governance standards that systematically maintain accuracy."

McKinsey suggests that for the immediate future, companies could outsource the process of sorting through this data accumulation to data specialists who use cloud-based solutions to unify enough data into blocks of actionable information that can facilitate corporate analytics.

The strategy might work. However, organizations will still be left with the persistent decisions of which data they should save forever and which they should throw away. If they make the decision to retain more data in a semi-permanent fashion, they will also need to pay for it by investing in storage for long-term data archiving.

At the same time, CIOs are likely to find few allies when it comes to longer term investments in data custodianship - and even fewer sympathizers for historical data cleansing, integrations and archeological expeditions that aren't absolutely necessarily for big data projects.

The dilemma may well provide a tipping point to cloud-based storage that at last trumps corporate worries about data security and protection in the cloud!

Avoid the looming shadow of the Data Warehouse


A cloud-based storage strategy for your old data could well prove to be more economical than continuing to keep all of your old data in mothballs on data center storage.

Secondly, moving archival data to the cloud (even if the data is largely unknown until it can be assessed) provides pain relief from having this data onsite and in a state of stasis until a unique big data trends projects beckons for it.

By Mary Shacklett

Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President o...