How to quiet big data noise

Follow this expert advice on how to refine your big data and team to understand what insights you really need.

data graph

image: Monsitj/Getty Images/iStockphoto

Michael Vaughan, co-president of Regis, a business simulation and experimental learning company, recently recalled Kurt Vonnegut's short story, Harrison Bergeron.

I remember it, too. It was a futuristic portrait of a society that strove to make everyone equal. Radio earpieces that blasted loud noises were installed on anyone who was more intelligent than the norm so they would only be able to think average thoughts.

SEE: Special report: How to win with prescriptive analytics (free PDF) (TechRepublic)

I couldn't help but liken this to big data because companies with average big data efforts are trying to plumb average big data for extraordinary insights—and are having a tough time getting the job done.

"It's difficult to get insights out of a huge lump of data," said Maksim Tsvetovat, a data scientist at Intellectsoft, an enterprise software development company. "There has to be a discernible signal in the noise that you can detect, and sometimes there just isn't one. Once we've done our intelligence on the data, sometimes we have to come back and say we just didn't measure this right or measured the wrong variables because there's nothing we can detect here."

Coming up without results from big data analytics is a risk that IT and data science groups face every day. So what steps can you take to cut that risk?

Here are three key steps to soften the noise and manage large amounts of data more efficiently; interestingly, most of them have to do with investing in more technology.

Work with high-quality data

Data is compromised when it is misspelled, duplicated, invalid, missing, etc. When bad data gets fed into big data algorithms, bad results occur, and bad decisions can get made.

I found this out firsthand when I was CIO at a financial institution. A credit card fraud analytics program flagged a board member's card as suspicious as he was going through the checkout line in a box store. The identification was a false positive. The board member was denied his transaction, and the incident was very embarrassing. The dirty data and the analytics software didn't have to explain the mistake—I did.

These situations happen in companies every day because data needs to be cleaned and prepped before it gets queried by analytics. Big data is even tougher to clean, since it comes in many different shapes, and there is more of it.

Ensure workers have the right data analytics skills

Finding skilled data scientists and analysts remains a major challenge for organizations, so it's important to create a strategic skills development plan.First, you should identify the necessary data analytics skills you want in a candidate, and then hire based on those requirements. You should also identify the top performers in your organization who are most likely to  learn the analytics skills that you need, and then help those employees succeed..

A third approach is to team with local colleges and universities that are running analytics programs and recruit some of their top students to be  interns, who can be transitioned into full-time employment if the internships work out well.

 A fourth strategy is to retain outside analytics consultants, with the caveat that the outside help assists your internal employee development so your staff can eventually perform your analytics.

Identify the most business savvy people in your organization

A financial analyst in your organization might know the most about the fine details that factor into a risk assessment decision because he or she works with these details every day. If you're in sales, your most customer-savvy person might be the person at the customer service help desk who works daily with customers.

In many cases, these are the people who should provide input into how to look at data and the business elements that matter.

Key takeaways

Like Bergeron, what you should strive for in your data and talent is high quality with little noise to hamper insights. High quality, "noiseless" data comes from properly preparing and vetting data for accuracy. Quality talent comes from identifying those people who possess unique business savvy and can ask the right questions of the data. Once you achieve your targets in these two areas, quality insights can only follow.

Also see