As organizations craft their big data strategies and launch
projects, an undertow of unorganized data of both the structured and
unstructured varieties bulges in storage. Data managers are aware of it and
cringe. Users ignore it and move on. But sooner or later, the data stockpile
has to get dealt with in a comprehensive big data plan that eventually must
determine which data is worth keeping under management, and which should simply
be thrown away.

Stay or go

In some cases, the decisions on which data to keep and which
to discard are being made for
companies. Industry regulators place demands on enterprises to retain data for
certain periods of time, as do corporate lawyers concerned about e-discovery
and the need to produce email and other historical records for litigation. In
other cases, age-old data retention policies for different corporate systems
keep chugging along, together with their individual policies on data retention.

Beyond these, however, many organizations still face a
bottomless data chasm that is a challenge to excavate.

How do you sift through cob-webbed data repositories that
may or may not have value, and also create strategies for de-duplicating and
sanitizing this data? And can you ever know for sure when the corporate C-suite
will demand long-term trending information that depends on historical data
gathered over several decades?

Behind this decision making is an almost instinctive drive
to throw all of this old data away, at the same time that there is inherent
fear that the data may somehow, someday be needed.

How do organizations get on top of this dilemma?

Research and analytics firm McKinsey talks about the need
for a “plan for assembling
and integrating data that’s frequently
horizontally siloed across business units or vertically by function
.”
This data might exist in numerous internal legacy systems, or in combination
with new, unstructured data coming in from social media, machines, and other
Web sources.

McKinsey comments:

“Making this information a useful and long-lived asset
will often require a large investment in new data capabilities. Plans may
highlight a need for the massive reorganization of data architectures over
time, sifting through tangled repositories, and implementing data governance
standards that systematically maintain accuracy.”

McKinsey suggests that for the immediate future, companies
could outsource the process of sorting through this data accumulation to data
specialists who use cloud-based solutions to unify enough data into blocks of
actionable information that can facilitate corporate analytics.

The strategy might work. However, organizations will still
be left with the persistent decisions of which data they should save forever
and which they should throw away. If they make the decision to retain more data
in a semi-permanent fashion, they will also need to pay for it by investing in storage
for long-term data archiving.

At the same time, CIOs are likely to find few allies when it
comes to longer term investments in data custodianship – and even fewer
sympathizers for historical data cleansing, integrations and archeological
expeditions that aren’t absolutely necessarily for big data projects.

The dilemma may well provide a tipping point to cloud-based
storage that at last trumps corporate worries about data security and
protection in the cloud!


Avoid the looming shadow of the Data Warehouse


Why?

A cloud-based storage strategy for your old data could well
prove to be more economical than continuing to keep all of your old data in
mothballs on data center storage.

Secondly, moving archival data to the cloud (even if the
data is largely unknown until it can be assessed) provides pain relief from
having this data onsite and in a state of stasis until a unique big data trends
projects beckons for it.