Big Data

Why big data analytics strikes out sometimes

Biases, imperfect data structures, and computing failures are just some of the reasons why big data analytics can be inaccurate.

Oakland A's coach Tye Waller talking about his database for the baseball team.
Image: Daniel Terdiman/CNET

Babe Ruth hit 714 home runs in his Major League Baseball career. During the same period, he also struck out 1,330 times. In this respect, big data is like the Babe. Many times, big data "hits a home run" and transforms a company; sometimes, it strikes out.

For instance, did big data analytics foresee the recent plummet in oil prices? No, based on what Saudi billionaire businessman Prince Alwaleed bin Talal said in a January 2015 interview for USA Today: "Saudi Arabia and all of the countries were caught off guard. No one anticipated it was going to happen. Anyone who says they anticipated this 50% drop (in price) is not saying the truth."

Google Flu Trends unsuccessfully predicted flu outbreaks in 2013 when it ended up forecasting twice as many cases of flu as the Center for Disease Control and Prevention reported. "Among the underlying problems was that Google assumed a constant relationship between flu-related searches and flu prevalence, even as the search technology changed and people began using it in different ways," wrote David Lazer, a joint professor in political science and computer science at Northeastern University, in a 2014 MIT Technology Review article. "That failure is the big-data era's equivalent of the Chicago Tribune's 'Dewey Defeats Truman' headline in 1948."

The good news is that the more companies use big data, the more they learn about the strengths and weaknesses of big data in the same way that baseball managers learn about the strengths and weaknesses of their hitters.

Here are some of the kernels of wisdom that companies are gleaning as they gain experience with big data.

  • Never forget that big data analytics is the result of the questions and data structures that data scientists and business analysts put in motion. The machine is faster, but in the end, it simply carries out the orders. It can only go as far as the minds that drive it, so you still have to think about what an analysis might have missed.
  • If you're using big data in real time as a mission-critical application, you still need a manual failover plan. Either on-premise or cloud-based computing can fail. If this happens, you need knowledgeable people on staff who can take the helm of an operation and run it.
  • Big data analytics is still in an early learning period when it comes to predicting human behavior. Modern humans have been on the planet for 200,000 years, and we still haven't figured out why we act the way we do! Even with associational "thinking" and processing, machines are limited in predicting human behavioral outcomes.

"Data and data sets are not objective; they are creations of human design," wrote Kate Crawford in a 2013 Harvard Business Review article. "We give numbers their voice, draw inferences from them, and define their meaning through our interpretations. Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves."

Does this mean that companies should limit their big data uses? Most assuredly, no. It is simply a reminder that big data, like all other data-related projects, should be approached circumspectly, and that it should not be an end in itself.

Also read

Disclaimer: TechRepublic, ZDNet, and Tech Pro Research are CBS Interactive properties.

About Mary Shacklett

Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President o...

Editor's Picks

Free Newsletters, In your Inbox