Big data and its business potential are justifiably big news. The legal issues and ramifications of big data, however, are not yet well understood. As enterprises are just beginning to adopt and integrate the technologies, the time to take a serious look at the issues is now.

In doing so, companies need to see the value of bringing together their legal advisors and IT decision makers, along with the keen and active interest of C-suite executives and board members. There are risks of litigation and regulatory scrutiny if an enterprise, early on during deployment, has not investigated and addressed the legal risks that come with big data.

This article reviews the current thinking on big data and legal risk by addressing the following topics:

  • Data ownership and security
  • Consumer privacy
  • Third-party contracts
  • Regulatory compliance
  • Underlying contracts
  • Legal discovery

Companies should consider the entire data lifecycle when analyzing big data and legal risk: data generation, transfer, use, transformation, storage, archival, and destruction. They should also consider the nature of the data, the gateways of data collection, government regulation, state and federal laws, and international legal implications.

Determining the nature of the data is essential to any effective analysis. Are the data personally identifiable information (PII)? Is it healthcare (HIPAA) or financial (GLB Act) information? Especially sensitive are locational data in the current debate about digital surveillance, marketing, and individual privacy.

Enterprises can use a four-step process when assessing legal issues for adopting big data:

  • Well-defined data ownership with regard to data generators, collectors, and processors
  • Transparent policies for individuals and other relevant entities about the use of data
  • Thorough and ongoing review of state laws, court cases, and federal regulation and legislation
  • Avoidance of “invasive” marketing, i.e. showing customers that you know much more about them they would like. “Creeping out” your customers may not be illegal, but it’s not good business either. (The classic example is when Target sent pregnancy-specific coupons to a teenage girl before she told her family that she was pregnant.)

Data ownership

When analyzing data security and consumer privacy, the ownership of data is a logical place to start. Big data analytics platforms will create challenges in determining data ownership, which in turn depends on the nature of the data, how they were generated, how they were collected, where the data came from (state laws, international), and whether they are attributable to a person or to a machine or device.

The main concern consumers and users have with big data is privacy. But the increased availability and volumes of data, by themselves, are arguably not the problem; the problems will come with incomplete or incorrect contexts, mistaken meanings, and harmful or invasive actions.

Legal analysis is an important part of determining who has rights to specific data, but the context of the data is essential as well. The source of the data, and who therefore owns the data, will be where the conflict lies between organizational use of (and individual rights over) consumer-generated data. Moreover, who has rights to the data will determine who gets to use them and how — or whether the data get used.

This is a complex and quickly developing area of legal practice and technological application. Courts and legislatures are most often backward-looking. In our current paradigm of reactive governance, the law simply cannot keep up with the pace of disruptive technologies. Enterprises will need to monitor legal developments around big data and carefully assess their risk exposure.

Data security

Big data has the potential to magnify security and privacy issues. Therefore, it is critical that lawyers and IT leaders consider data protection, privacy, and confidentiality in terms of how their organization sources, analyzes, reports, and stores data.

Larger big data volumes create a higher risk of data breach. The more data are concentrated, the more they become a target for hackers and the greater the consequences of the breach. Security breaches can be very costly for businesses: regulatory responses, civil litigation, and payments of damages to consumers. Companies must invest and engage in effective data security and realize that managing cyber-risk is an element of business in the digital age.

Elements of big data information security include:

  • Data integrity and privacy
  • Encryption
  • Access control
  • Chain of custody
  • Relevant laws and regulations
  • Corporate policies

Areas where enterprise information security policies need to be monitored include:

  • Vendor agreements
  • Data ownership
  • Custody requirements
  • International regulations
  • Terms of confidentiality
  • Retention and archiving of data
  • International and geographical issues

Consumer advocates, businesses, and government should focus on creating a well-established legal framework to address sensitive user data, social data analysis, and the cross-referencing and mingling of data obtained from diverse sources. This framework can work to protect data originators, who are often consumers, businesses the collect data, and analysts of large data sets.

Consumer privacy

Privacy agreements are the foundation of big data contracts. To understand them, it is necessary know what is meant by personal information and how a privacy policy addresses personal information.

The National Institute of Standards and Technology (NIST) defines personally identifiable information (PII) as “any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual’s identity, such as name, social security number, date and place of birth, mother’s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.”

“The extent of consumer profiling today means that data brokers often know as much — or even more — about us than our family and friends, including our online and in-store purchases, our political and religious affiliations, our income and socioeconomic status, and more,” said FTC Chairwoman Edith Ramirez regarding a May 2014 report on data brokers. “It’s time to bring transparency and accountability to bear on this industry on behalf of consumers, many of whom are unaware that data brokers even exist.”

U.S. privacy law draws distinctions between public data and private data. Private data means that which is kept strictly confidential. Public data refers to those that have been fully disclosed to the public or partially disclosed, under specific circumstances, to a limited audience. That distinction creates legal challenges regarding the disclosure of big data, because in the real world, the distinction between public and private data is not always obvious.

To comprehend and manage the legal issues, one needs to translate the legal distinction into a real-world difference between what is public and private. Initially, this means determining who originated the data, i.e., the initial owner of the information. Figuring out who might have an ownership claim on data will help an organization identify whom it might have to contact to verify whether it has expectation of privacy.

Reasonable expectations of privacy

Once the players and their possible ownership interests are identified, one can look at the privacy issues. Individuals and organizations may expect that most, or at least part, of the information they provide will remain private. They may not even expect that a data processor will use their information. Violations of a reasonable expectation of privacy can result in litigation.

The courts have identified two factors that indicate an expectation of privacy: the likelihood that the information will be discovered and the likelihood of discovery by a third party with harmful intentions. Courts of law often ask questions about expectations of privacy such as:

  • Was the information known to limited group of people?
  • Was it unlikely to spread beyond that group?
  • Does the expectation of privacy become unreasonable when an individual shares that information with others, knowing that there is a risk it may spread?
  • Was the disclosure of information inadvertent or involuntary?
  • Was the information acquired by means of overzealous surveillance in public?

A privacy policy is a contract between a data collector and a data originator, the individual or entity that provided the data. A lawyer looking at a privacy policy needs to ask the following:

  • How does the policy define personal information?
  • How can the information be used and to whom may be disclosed?
  • Do limitations govern disclosure or are they stated in general terms?
  • What consent has been given to disclosure?
  • Does the policy allow for disclosure?
  • In what form can information be disclosed?
  • Can the policy allow the kind of disclosure the data collector expects?
  • Does the data originator have a reasonable expectation of privacy?
  • Is permission required from the originator before information is transferred are processed?

Third-party contracts and discrimination concerns

An area where special attention is crucial involves contracts with third-party big data providers and analysts. Robust controls are necessary. More and more big data solutions and technologies are supplied by third parties; thus, an enterprise needs to have restrictions and protections in place to ensure that backdoor discovery of PII does not occur.

Such third parties provide products and services for analyzing, managing, and storing highly complex and sensitive data, such as consumer behavior or personal health information. Relying on third parties for analysis can create significant risk with respect to liability, for example, when the output data is based on inaccurate or incomplete information, where expected correlations do not happen, or when hackers are able to re-identify individuals from multiple data sources.

Regarding contracts with third-party big data providers, problems include:

  • Inadvertent data loss, such as stripping metadata and truncating communication threads
  • Custody and control of data
  • Disagreements with relevant policies and procedures
  • International rules and regulations

Some observers have voiced the concern that big data can lead to forms of discrimination that are more automated and harder to detect. Sources of data could include social media and healthcare or biometic information. Part of the solution will be for data brokers and companies to be more transparent about how they use information to make decisions about consumers and actual customers. IT leaders and their legal counsel should expect more consumer action and government inquiries in this area of concern.

Regulatory compliance and underlying contracts

Due diligence requires that attorneys examine the contracts between all the players to understand the legal ramifications of a big data transaction. A significant question is whether a contracted data collector is subject to a high degree of regulatory scrutiny. Prime examples are the healthcare and financial services industries.

The place to start researching regulatory issues and concerns with big data is the Federal Trade Commission. Since many big data transactions involved marketing to consumers, they are under the jurisdiction of the FTC. Legal counsel will also have to assess the relevant federal and state laws when examining a big data contract. And when necessary, lawyers drafting a contract will have to consider when data leaves the United States and the laws and regulations of the countries involved.

Other issues include:

  • Warranties: Lawyers have to consider warranties and contracts that cover the accuracy and completeness of data, compliance with privacy policies, compliance with privacy expectations, and warranties required by law.
  • Indemnification agreements: These provide protection against breaches of warranties and of contract terms governing privacy and security data. They can also address intellectual property infringement.
  • Control of data: Control can change as data moves from player to player. Who is responsible for the data as it moves to the transaction, and how will a contract ensure that roles and obligations are balanced?
  • Contract termination: How will the contract end, what rights and obligations survive contract termination, and who is responsible for data security after termination?

Legal discovery

Enterprises must realize that the use of big data analytics may open the door for legal discovery by opposing litigants and government regulators. Technical and other limitations for retrieval of data have decreased with the advent of big data, and companies may be forced to produce the raw data underlying their big data analyses. This can potentially include sensitive proprietary information and PII. Before releasing a big data analysis, a company needs to perform a legal risk analysis of the information, players, and issues involved.

Once the legal discovery process has begun, it may prove difficult for a company and its attorneys to limit the scope of the investigation, so it may end up producing more data than is necessary. At present the legal challenges to such wide-ranging investigations are relatively new, and there are no well-established industry best practices.

In drafting contracts, it is essential to spell out which party in a big data transaction will be responsible for responding to legal requests: the data collector, processor, or end user?


A legal practitioner should be well versed in big data issues and capabilities and take a data-centric view of his or her clients’ products and services. Likewise, IT leaders and company executives need to appreciate the perspective and expertise that legal counsel provides. Executive support and oversight is a key success factor in any big data risk management program. Big data sources and the use of big data must be protected just like any other sensitive corporate document, data set, or record.

Companies should pursue a practical, holistic approach to big data and legal risk mitigation:

  • Use a cross-functional approach
  • Produce written standard operating procedures and protocols
  • Leverage native functionality when responding to legal requests for information
  • Seek to report what is sufficient and appropriate for the courts
  • Implement changes consistently across all departments

If organizations create risk programs and do not implement them, or pursue them inconsistently, they can wind up having even bigger problems in the courts and with investigating regulators.

Issues on the horizon include increasing consumer awareness of how big data affects them, and the security and privacy risks of “dark data,” large volumes of uncategorized and un- or poorly permissioned text and image data in enterprise storage systems. Companies will have to assess whether certain types of data are necessary to their business model. Social media, mobile devices, and within several years, the Internet of Things will continue to generate ever-larger amounts of data.

And it cannot be stressed enough: Consult with your legal counsel, stay up to date with big data legal developments and best practices, and never stop trying to get the answers before competitors, litigation opponents, and government regulators get them!

Subscribe to the TechRepublic Premium Exclusives Newsletter

Save time with the latest TechRepublic Premium downloads, including customizable IT & HR policy templates, glossaries, hiring kits, features, event coverage, and more. Exclusively for you! Delivered Tuesdays and Thursdays.

Subscribe to the TechRepublic Premium Exclusives Newsletter

Save time with the latest TechRepublic Premium downloads, including customizable IT & HR policy templates, glossaries, hiring kits, features, event coverage, and more. Exclusively for you! Delivered Tuesdays and Thursdays.