Taking out the Big Data garbage

Organizations are seeing storage resources being consumed with an avalanche of Big Data at a time when they don't have the policies in place that can help them determine when it is OK to throw Big Data out.

I recently wrote about "finding a mission" for Big Data.

I firmly believe that it will be actual use cases in the business that move IT into a set of best practices for Big Data, and not some theoretical "best fit" processing and storage architecture that IT simply orchestrates for what it thinks Big Data will require.

These use cases will force a set of new inflection points into IT data center thinking that will range from how data is processed and stored to the intrinsic value that the data holds once it is processed. Nowhere is this more important than in the area of data storage-where organization are seeing storage resources  being consumed with an avalanche of Big Data at a time when they do not have the policies and guidelines in place that can help them determine when it is OK to throw Big Data out.

One way to illustrate this is by taking two currently popular use cases for Big Data. Both involve processing Big Data in near real-time so that businesses can immediately analyze and respond to what is going on in the outside environments in which they operate.

The first example is of an online retailer that wants to track instantaneously which items are selling the most, and in which geographies. In this scenario, the business goal is to be able to instantly present "blue plate" specials that feature the most popular selling items to customers in the geographies where the items are selling best.

A second example is a city road monitoring system, propelled by sensors that monitor the surface conditions of the roads and even traffic congestion and throughput so the city can post warnings to motorists about which routes to take for best results.

Both of these scenarios show Big Data in a "peak" role while it is being used for near real-time analytics so the business can respond to customer demand or traffic conditions. Once these situations terminate, however, the Big Data used for on-the-spot analysis rapidly loses value. Unlike traditional transaction data, this near real-time Big Data is not required to be archived by regulators or standard IT governance practice.

Naturally, this begs the question of-can we throw the data out?

From a trends standpoint, the initial answer is: not immediately. In fact, what IT should be proactively doing is working together with end users to 1) design analytics reports that capture what the data told the company about given sale or traffic condition at a specific point of time and in a specific situation; 2) store this summary information in a data mart for future longer term trends analysis, and 3) orchestrate a strategy to get rid of the foundational Big Data that has now lost its immediate usefulness and can be removed.

These steps constitute a followup work process for Big Data analytics that addresses the immediate need to know; and the need for the business to go back for reviews and studies of trends. Equally important for IT is the last step that addresses "taking out the data garbage." Ironically, this is a step that often gets missed on many Big Data projects. It shouldn't because it's one of the best ways to ensure that IT Big Data resources are always optimized for the best return on investment.