Big Data optimize

Big Data: The perils of past performance

With Big Data especially, pundits and vendors imply that if we throw bigger and better data at a faster platform, we'll eventually be able to predict the future with near certainty. Here's why this is wrong.

One of the great and often ignored challenges of Big Data is whether or not historical data is actually relevant to answering a given question. Anyone who has ever glanced at the fine print of their stockbroker's web page or an investment prospectus has likely noticed the old quip that "Past performance does not guarantee future results." Despite a century of increasingly complex analytics and reporting, this disclaimer has largely held true; otherwise we'd all happily pick yesterday's market winner, then begin researching which island to buy as the returns poured in.

While this is common sense for the average investor, in IT we occasionally forget the perils of past performance. With Big Data especially, pundits and vendors imply that if we throw bigger and better data at a faster platform, we'll eventually be able to predict the future with near certainty.

A crisis of philosophy?

It's rare that we get to talk about a topic like philosophy when dealing with data and technology management, but discussions around Big Data should include pause to consider IT and the larger organizations' philosophies around data. For IT especially, we are used to solving technical problems that usually have a defined answer. With the right resources, we're able to overcome most technical challenges or eventually discover that the cost of those resources is simply not economical. With Big Data, we tend to apply this veneer of certainty to what amounts to predicting the future.

Superficially, Big Data looks like any other technical problem. There are IT resources to be marshaled, plans to be created and executed, development work to be done, and people to be managed. It's easy to assume there's a "right" answer to Big Data and to put an unflinching faith in the ability of the system to do what amounts to predicting the future.

Big Data can also put your focus in the wrong place. Most of the events that shape the future are external: a competitor may enter a market or launch a new product, an economic or social calamity may reshape markets, or anything from a war, demographic shift, or alien sighting could change the future overnight. Tools like Big Data focus on past performance. Even with Big Data's promise of near real-time analysis, you're still working with "old" information, even if it is only microseconds old.

Keeping focused

Part of the fervor over Big Data is similar to the latest "surefire" investment scheme: we all want an ability to peer into the future, and any individual or organization that can predict the future will obviously have a dramatic leg up on the competition. However, it's incumbent on IT leaders to temper some of the enthusiasm around Big Data and ensure that it's presented as a useful and effective tool for understanding past performance, but one that cannot predict future results. With any new reporting tool, it's tempting to put faith in the technology and focus inward rather than keep a metaphorical finger in the wind, attempting to spot the next shift. As one of the more rational and technologically inclined groups in most companies, IT leadership is well positioned to provide this organizational gut check.

As your history teacher likely admonished, learning from the past is critical to understanding the future, but wars end, economies change, and demographics shift. Just as we need historians and prognosticators, temper your Big Data initiatives with one eye firmly focused on the future.

About

Patrick Gray works for a global Fortune 500 consulting and IT services company, and is the author of Breakthrough IT: Supercharging Organizational Value through Technology, as well as the companion e-book The Breakthrough CIO's Companion. Patrick has...

10 comments
DesertJim
DesertJim

So there was a whole flock of Turkeys. Every day the farmer would come and feed them corn, but they were basically afraid of him, but one turkey noticed that if he ran up to the farmer instead of running away he got more corn and didn't have to compete with other turkeys for it. This happened every day, and he got bolder and bolder eventually eating out of the farmer's hand. Soon he was the biggest turkey in the flock, and he was able to wait for the farmer and choose his corn from the bucket. Every day learning from yesterday and getting bolder and bigger. Until Christmas eve...... (for Turkeys substitute DEC, Microsoft, Nokia, RIM (and Apple?)

tomritchey
tomritchey

The epistemological issues of predicting social-technological development is taken up in this recent publication: “Wicked Problems – Social Messes: Decision support Modelling with Morphological Analysis”. Springer, 2011. You can see a description at: http://www.springer.com/business+%26+management/technology+management/book/978-3-642-19652-2 Regards, Tom Ritchey SweMorph

p.gygi
p.gygi

This belief parallels the classical physics view of the universe and how all things could be predicted before the advent of quantum mechanics.

waltersokyrko
waltersokyrko

Many organizations cannot implement little data correctly: no MDM; the same data input into multiple systems manually causing errors; the same field defined differently in different systems; no incremental loading from operational systems to reporting systems; inability to aggregate information from different organizational units. Many organizations have no hope to implement big data in the foreseeable future.

gcottman
gcottman

Being unable to predict the future does not make it impossible to derive value from data analytics. Much of it comes down to the scope of the response to the findings of such analysis. Would I bet my house on a sure-fire investment plan arising from analysis of past data. Certainly not! Would I take a closer look at credit card usage that correlated very strongly with past patterns of fraud. Very probably. As Elijah said, we already make decisions all the time. The inability to make a decision that is guaranteed to be correct does not preclude the necessity of making your decisions on the best data available.

HAL 9000
HAL 9000

The World Ends on December 21 2012 according to the Big Data Annalist so we don't have much time to wait. We all may as well get on and do what we want instead of what the Boss wants after all what's it going to matter come December 22? ;) Col

RudHud
RudHud

This sort of thing was a big deal in the social sciences in the 1960s -- using factor analytic techniques to extract "dimensions" from Big Data. By the time I hit grad school in 1975 it was already a flop. Why? The data were collected using theories as to what was important. And what, exactly, were those theories? Good luck finding any documentation at all -- the people who defined the data collection probably weren't aware of them. They were there nonetheless. And that's what you found with your factor analysis -- exactly what the unspoken, unchallenged theories of the mediocre trudgers put in. If you asked them why, all they would be able to say was, "It's obvious." So you find out what's obvious to the ignorant and dull. Harsh but true. Instead, try starting with a sharp, clever theory, and refining it with testing.

ElijahKam
ElijahKam

I am definitely not an expert on business computing, but I do know something about the social sciences. I would think "Big Data" should be used to detect relationships and correlations that would otherwise not be apparent. It is not so much a question of predicting the future but rather trying to determine what is going on right now. Actually I think business does this all the time although they often neglect to take advantage of the analysis.

jhughes2020
jhughes2020

Instead of using Big Data analysis to predict future results, perhaps we should try using Big Data analysis to discover what not to do.

vegesm
vegesm

What is new (at least I hope it changed since back then otherwise you're right) is that now collecting data means collecting every available data. Everything wich is even remotely related to our goals. 30 or 20 years ago we didn't have the infrastructure to store all those data let alone process it.