Harvest big data in two ways for different objectives

Mary Shacklett makes the case that enterprises should consider looking at big data harvesting in the "short way" and the "long way."


Data science's legacy is its ability to investigate complex problems in mathematics, statistics, and computer science that historically appeared in genome research, pharmaceutical research, intergalactic explorations, analyses of global warming and the structure of the earth, etc. These investigations used man- and machine-made data, and the domains of this data were seemingly infinite. In a research or academic setting, there was a tacit understanding that the goal was data exploration, with the hope of unearthing rare data gems that would pave the way to breakthroughs in a variety of fields.

On the other end of this spectrum was the acknowledgement that perhaps no meaningful answers would be found after years of probing data with a plethora of algorithms and analyses. An example of this is SETI (the search for extraterrestrial life), which began in the 1960s with a cadre of volunteers analyzing radio signals from space for any discrepancies that could indicate a communication pattern and the possibility of other life. No evidence of extraterrestrial life has been found, but the search and the data science behind it continue.

In the enterprise world, the fierce competitiveness of business and the clamoring of boards and stakeholders for favorable quarterly results by necessity focus executives on the short term. They simply can't afford to be a SETI, or any other open-ended academic or research endeavor that may not produce any usable results.

Because of this, most organizations set their big data goals within a realm of expectation where they feel confident that they can capture results from their big data forays that are immediately actionable, and are predictive results that can foretell future trends. In the process, they siphon off big data that they feel is extraneous to their data cultivation processes, since they have to pay for processing and storage. The process is sensible, more affordable, and has a way of bringing big data project results back to eagerly waiting boards and stakeholders.

But it was in a recent discussion with Michael Hiskey, a data scientist and vice president of data analytics for MicroStrategy, that another perspective was presented.

"I really don't think organizations should throw any of their big data away," Hiskey said, "Because you never know where your innovation and queries could lead you in the future -- what data you'll need to examine new questions -- or what you might find."

There is an argument that enterprises need to retain all of their data, even when it does not appear to be useful; the reason is that new queries and investigations of data that are not even thought of today may need to be asked in the future. If enterprises keep their data options open by not prematurely getting rid of data they perceive as "useless," it might allow them to capitalize on future data research breakthroughs that could lead to unique competitive advantages in the marketplace, and even to new product innovations.

Hiskey described how a pill manufacturer probed deeper into its big data and came up with the idea of embedded intelligence in the pill that could track where the pill was and if the patient had consumed it (pill taking at home -- and keeping track of it -- is a major challenge for many patients). Apparently, the idea hatched from looking at big data had previously not been considered.

The takeaway

Enterprises should consider looking at their big data harvesting in more than one way. There is the "short way" that yields immediate results for purposes of day-to-day operations management and longer-term trends prediction, and the "long way" that holds no guarantee of success or results, but that has the ability to deliver an unexpected intelligence gem that could transform a company.

Historically, CEOs and CIOs have struggled to get R&D dollars when no returns from investment could be promised. If they are going to consider this now (especially if they are in an industry that doesn't place a high value on R&D), they will have boards, stakeholders, and possibly even themselves to convince. However, if they manage to carve out an unencumbered "data sandbox" operation alongside their day-to-day big data operations, they could position themselves for the best of both worlds.

By Mary Shacklett

Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President o...