Purge vs. repurpose data exhaust: How to find the right balance

Discover what IT pros must consider when they're thinking about whether they can repurpose data exhaust for immediate or foreseeable business advantages.

Image: iStock/kentoh

Data exhaust is unstructured information or data that is a by-product of the online activities of internet users or machines. It can come in the form of log files, temporary files, cookies, and even machine-, process- and transaction-generated data that is not part of the immediate data capture for machine-based activities.

More companies are beginning to pay attention to data exhaust, because if it is plugged into the right business case, it can be beneficial to corporate strategies and operations.

For instance, a global manufacturer that is linking together manufacturing activities in Taiwan and Germany can have machines on factory floors in these different areas of the world talking to each other and controlling manufacturing processes. In addition, it can use the data exhaust from these machines (which also flows over internet) to predict machine failures before they happen.

New York Air Brake uses byproduct machine data for remote freight train monitoring. The data reveals how energy efficiently trains are running, and it provides insights into engineer driving habits and how these can be optimized to improve train energy consumption. The company reported that by using this byproduct data exhaust, it improved the energy efficiency in its trains by 8-12%.

More commonly, marketing departments have the ability to learn more about their consumers if they track some of the Internet of Things (IoT) activities around the transactions that consumers execute. For instance, from what internet destination did the consumer enter the website? What time of day was it? What other events were going on at that time? What was the navigation pattern of the consumer on the website? If he or she was shopping, what other items did the consumer look at?

In the end, harnessing data exhaust for business purposes boils down to understanding which types of unstructured data are nonessential to immediate business processes, what these pieces of data are, and how they can be put to productive use in other important business scenarios.

If you can't find a way to repurpose this data exhaust for immediate or foreseeable business advantage, you should likely get rid of it for two good reasons. First, according to a 2014 IDC study sponsored by EMC, the digital universe is doubling in size every two years and will multiple 10 times over between 2013 and 2020 (i.e., from 4.4 trillion gigabytes to 44 trillion gigabytes). Secondly, a large share of data exhaust, which contains machine handshakes and handoffs, network status, etc., becomes rapidly outdated and has negligible future value.

The bottom line

IT needs a plan for data exhaust. The plan should consist of these five elements.

  1. Educate managers in business areas throughout the company about data exhaust -- what it is, how it can produce value, and the hazards of letting this data indiscriminately pile up, which consumes dollars and resources.
  2. Advocate a balanced approach of identifying specific business cases where data exhaust can bring benefit against the principle of "taking out the trash" at timely intervals so your storage isn't overrun by data with minimal value that never gets purged.
  3. Develop a methodology where end users and IT meet periodically to identify business uses for data exhaust.
  4. Set business goals and metrics for every data exhaust project, and then measure against the goals and metrics to see if the project meets its objectives.
  5. Create guidelines for purging data exhaust if it is not being used and is unlikely to be used.

This combination of actively searching for data exhaust applications while also getting rid of data when it is of minimal value is a "balanced" approach that every IT organization should consider.

Also see