If you’re in the big data business, there’s a huge privacy issue that isn’t addressed as often as it should be.

The hottest privacy topic to make the headlines is the embarrassment your company will suffer if there’s a data breach. Other privacy topics that get a lot of coverage are the risk of discrimination (i.e., your algorithms show a discriminatory and illegal bias), inaccurate analysis due to fake news, and identity reverse engineering (i.e., basically undoing anonymization). While I agree these are significant issues that are exacerbated by big data, a bigger concern is what I call oracular responsibility.

SEE: Why big data and privacy are often at odds with each other (TechRepublic)

Why big data is a big privacy issue

Big data analytics has the power to provide insights about people that are far and above what they know about themselves. And, as Stan Lee says, “with great power there must also come–great responsibility.” Such is the responsibility of the oracle–thus, oracular responsibility. In fairness, this problem existed before big data, but it wasn’t a huge risk until big data analytics gave us the tools and techniques to be highly accurate with our predictions.

Let’s consider the DIKW (Data, Information, Knowledge, Wisdom) Pyramid. When most people talk about data privacy, their biggest concerns are actually with data, as it would be defined in the DIKW Pyramid. My social security number is probably sitting in multiple databases out there and if one of those databases is breached, I’ll have a huge problem.

The next level of the pyramid is information; this is where we start making actionable inferences about the data. When you’re looking into understanding users’ behaviors, this is going to freak people out even more. It gets worse.

The next level up is knowledge, which is where you start connecting the dots from different areas of a user’s life–their interests, shopping habits, political views, religious views, associates, professional development.

The most sophisticated practitioners of big data analytics go all the way up the pyramid to wisdom, where this knowledge is tracked over time and curated into a very personal profile. Breach or not, most people would feel very uncomfortable knowing that someone or something knows that much about them. I consider this the biggest privacy issue faced by those practicing the dark arts of big data analytics.

SEE: Video: The top 5 reasons you should care about privacy (TechRepublic)

Show your cards

It’s important to be fully transparent with the subjects that you study. They might be your actual customers, or they might not be. You might be analyzing one group of people for the benefit of another group of people. In any case, it’s important to be upfront with the people you study and analyze. Cathy O’Neil, former Wall Street quant and author of Weapons of Math Destruction, explains the high risks of a big data cocktail containing opacity, scale, and damage. This poison is neutralized with transparency, which clears up the opacity.

As uncomfortable as it may be, a prominent aspect of your responsibility is to be honest with your subjects. Let them know that you study them. Let them know what your analytic capabilities are. Let them know what you know about them (in general terms), where you get your information, and your analytic reasoning.

This means you shouldn’t use completely black-box techniques like neural networks. You might build the most accurate neural network in the world, but if you can’t offer up some sort of explanation or rationale around its conclusions, then you’re just as in the dark as your subjects, which is not good. I know this sounds like a lot of information to share–and it is–so you must be careful not to overdo it.

SEE: Information Security Policy (Tech Pro Research)

But don’t give away the farm

Don’t feel so compelled by transparency that you give away your strategic secrets. After all, you are in business–a very competitive business. If you give away too much information, your competitive value is eroded. You must find a way to be transparent, while keeping the secret sauce behind the firewall. Here’s how.

Your IT leadership team should launch proactive communication campaigns, which could include PR, speaking, social media, and outreach programs–the more the better. Explain more about what you can do than how you do it. At a minimum, it’s your responsibility to let people know what you know about them and what you’re capable of doing with your analytics. For instance, if location analytics allows to you know where they are and where they’re likely to go next, then let users know you have this technology.

You should also share your prediction accuracy; this will help with reasoning in the absence of methodology. You don’t have to disclose everything about your methods, but if parts aren’t proprietary or particularly sophisticated, let them know. For example, if you’re merging their Facebook data with their Twitter data to get a better understanding of their interests, you should share that information with them. This level of transparency won’t clear you of privacy issues, but it will go a long way to build trust with the community that needs and deserves it.

SEE: Study shows people care more about data privacy but are doing less to protect themselves (ZDNet)


It’s no secret that data privacy is a huge concern for companies that deal with big data. But most of what people are talking about today are concerns every company has: breaches, discrimination, and unfair analysis. A huge privacy issue that you probably haven’t heard talked about enough is that you know more about people than they know about themselves.

Big data analytics introduce the ability to know so much about somebody that it’s frightening. It’s your oracular responsibility to disclose your powers and subsequently allay their inescapable concerns. Be very open and transparent about your business with those that you’re analyzing, without giving away the corporate secrets that keep you competitive. I’ve given you some techniques for how to accomplish this. If you’re in the business of big data, start owning that responsibility today by organizing an outreach program.

Nobody likes a phony psychic–even one that’s branded as a high-tech company using fancy analytics.

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays