Big data is not only big–it is also powerful and error prone, notes Susan Etlinger, an industry analyst with Altimeter Group, in her 2014 TED talk. “At this point in our history… we can process exabytes of data at lightning speed, which also means we have the potential to make bad decisions far more quickly, efficiently, and with far greater impact than we did in the past.”

Besides the potential for bad decisions, Etlinger believes that humans place too much faith in technology, including, for example, our blind acceptance of charts and graphs developed from big data analysis.

SEE: Infographic: 7 ways to build trust in data and analytics at your company (TechRepublic)

An ethical framework for big data analysis

As to what might be done to improve the situation, Etlinger and Jessica Groopman write in their Altimeter report The Trust Imperative: A Framework for Ethical Data Use (PDF) that businesses and organizations building and/or using big-data platforms need to start adhering to ethical principles.

To incorporate ethics, Etlinger and Groopman suggest studying The Information Accountability Foundation‘s (IAF) paper A Unified Ethical Frame for Big Data Analysis, and paying particular attention to the following principles (Figure A).

Figure A

1. Beneficial

“Data scientists, along with others in an organization, should be able to define the usefulness or merit that comes from solving the problem so it might be evaluated appropriately.” (IAF)

“The first principle for ethical data use is that it should be done with an expectation of tangible benefit,” write Etlinger and Groopman. “Ideally, it should deliver value to all concerned parties–the individuals who generated the data as well as the organization that collects and analyzes it.”

The authors offer Caesars Entertainment as an example. Joshua Kanter, senior vice president of revenue acceleration at Caesars Entertainment, mentions, “Before conducting any new analysis, we ask ourselves whether it will bring benefit to customers in addition to the company. If it doesn’t, we won’t do it.”

SEE: Why big data and privacy are often at odds (TechRepublic)

2. Progressive

“If the anticipated improvements can be achieved in a less data-intensive manner, then less-intensive processing should be pursued.” (IAF)

The value of progressiveness, according to Etlinger and Groopman, is reliant on the following:

  • The expectation of continuous improvement and innovation: In other words, what organizations learn from applying big data should help deliver better and more valuable results.
  • Minimizing data usage: Businesses should use the least amount of data necessary to meet the desired objective, with the understanding that minimizing data usage promotes more sustainable and less risky analysis.

The above principles should help eliminate hidden insights or correlations such as disenfranchising individuals based on race or demographics.

3. Sustainable

“Big-data insights, when placed into production, should provide value that is sustainable over a reasonable time frame.” (IAF)

Sustainability, according to the authors, is broken down into these categories: data, algorithmic, and device and/or manufacturer based.

  • Data sustainability: Sustaining value is closely related to what access organizations have to different social data sets. “While this is a fact of access and economics, it can wreak havoc when sets of data from public and private sources are combined,” mention Etlinger and Groopman. “The issue of sourcing also comes into play… Inconsistencies in sample sizes or methodologies affect the integrity of the data and the sustainability of the algorithm.”
  • Algorithmic sustainability: A critical element of sustainability is an algorithm’s longevity. The Altimeter report suggests longevity is affected by how the data is collected and analyzed.
  • Device- and/or manufacturer-based sustainability: A third consideration is the lifespan of the data being collected. “For example, if a company develops a wearable or other networked devices that collect and transmit data, what happens if that product is discontinued, or the company is sold, and the data is auctioned off to a third party?” ask Etlinger and Groopman.

4. Respectful

“Big-data analytics affect individuals to whom the data pertains, organizations that originate the data, organizations that aggregate the data, and those that might regulate the data in different ways.” (IAF)

Not mincing words, Etlinger and Groopman state, “The advent of social and device-generated data captured in real time decimates the norms for data analytics… As a result, even seemingly minor decisions can have tremendous downstream implications.”

As can be expected, the individual who originated the data will be impacted the most by big-data analysis, in particular making private, semi-private, or even public information more public.

5. Fair

“In lending and employment, United States law prohibits discrimination based on gender, race, genetics, or age. Yet, big data processes can predict all of those characteristics without actually looking for fields labeled gender, race, or age.” (IAF)

Etlinger and Groopman consider the ability to predict characteristics at any level just by asking for what they call unintended consequences. To counter unintended consequences, the authors again use Caesars Entertainment as an example, writing:

“Caesars has a simple yet effective litmus test for fairness, which it calls the Sunshine Test–whether the issue can be discussed openly and the final decision disclosed without any sense of misgiving. Before deciding on a course of action that requires customer data, the company’s executives imagine how people would react if all of the details were out in the open, in the light of day. Would it strengthen or threaten customer relationships?”

Joshua Kanter adds, “If the initiative fails the Sunshine Test, we do not move forward.”

Final thoughts

Etlinger and Groopman suggest that applying the five principles of ethical data use is a pragmatic approach for businesses. The two authors note, “At the same time, data complexity, differences in business models, emerging technologies, and most importantly, people, mean that no single approach will address every scenario.”