Big Data

Form a long-term storage strategy: Doing the big data grunt work

Unorganized data of both the structured and unstructured varieties bulges in storage. How do you figure out what stays and what goes?

As organizations craft their big data strategies and launch projects, an undertow of unorganized data of both the structured and unstructured varieties bulges in storage. Data managers are aware of it and cringe. Users ignore it and move on. But sooner or later, the data stockpile has to get dealt with in a comprehensive big data plan that eventually must determine which data is worth keeping under management, and which should simply be thrown away.

Stay or go

In some cases, the decisions on which data to keep and which to discard are being made for companies. Industry regulators place demands on enterprises to retain data for certain periods of time, as do corporate lawyers concerned about e-discovery and the need to produce email and other historical records for litigation. In other cases, age-old data retention policies for different corporate systems keep chugging along, together with their individual policies on data retention.

Beyond these, however, many organizations still face a bottomless data chasm that is a challenge to excavate.

How do you sift through cob-webbed data repositories that may or may not have value, and also create strategies for de-duplicating and sanitizing this data? And can you ever know for sure when the corporate C-suite will demand long-term trending information that depends on historical data gathered over several decades?

Behind this decision making is an almost instinctive drive to throw all of this old data away, at the same time that there is inherent fear that the data may somehow, someday be needed.

How do organizations get on top of this dilemma?

Research and analytics firm McKinsey talks about the need for a "plan for assembling and integrating data that's frequently horizontally siloed across business units or vertically by function." This data might exist in numerous internal legacy systems, or in combination with new, unstructured data coming in from social media, machines, and other Web sources.

McKinsey comments:

"Making this information a useful and long-lived asset will often require a large investment in new data capabilities. Plans may highlight a need for the massive reorganization of data architectures over time, sifting through tangled repositories, and implementing data governance standards that systematically maintain accuracy."

McKinsey suggests that for the immediate future, companies could outsource the process of sorting through this data accumulation to data specialists who use cloud-based solutions to unify enough data into blocks of actionable information that can facilitate corporate analytics.

The strategy might work. However, organizations will still be left with the persistent decisions of which data they should save forever and which they should throw away. If they make the decision to retain more data in a semi-permanent fashion, they will also need to pay for it by investing in storage for long-term data archiving.

At the same time, CIOs are likely to find few allies when it comes to longer term investments in data custodianship - and even fewer sympathizers for historical data cleansing, integrations and archeological expeditions that aren't absolutely necessarily for big data projects.

The dilemma may well provide a tipping point to cloud-based storage that at last trumps corporate worries about data security and protection in the cloud!

Avoid the looming shadow of the Data Warehouse


A cloud-based storage strategy for your old data could well prove to be more economical than continuing to keep all of your old data in mothballs on data center storage.

Secondly, moving archival data to the cloud (even if the data is largely unknown until it can be assessed) provides pain relief from having this data onsite and in a state of stasis until a unique big data trends projects beckons for it.


Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President o...


Hi Mary, nice post. you are very correct in stating that cloud will be a major factor for big data analytics. 

Cloud based analytics are gaining ground among enterprises for many financial and operational benefits. However, often the requirements for big data analysis are really not well understood by the developers and business owners, thus creating an undesirable product.

The success of extracting people oriented Business Intelligence depends upon the ability to collect every possible expression and derive the business observations from it.
There is a need to develop expertise and process of creating small scale prototypes quickly and test them to demonstrate its correctness, matching with business goals.

I have registered for a webinar on Deploy Big Data solutions Rapidly in Cloud through Harbinger’s ABC model (Agile-Big Data-Cloud), it looks a promising one
Mark W. Kaelin
Mark W. Kaelin moderator

How does your organization determine what data is important and should be stored long-term and what data can be thrown away and when?

Editor's Picks