Google's data loss prevention API helps enterprises mask sensitive information in the cloud

The DLP API now includes data de-identification capabilities such as redaction, masking, and tokenization, to keep sensitive information safer.

Video: Everything business needs to know about Android 8.0 Oreo

On Thursday, Google announced new features for its Data Loss Prevention (DLP) API that will give enterprise users a more secure way to run analytics on sensitive data in the cloud and turn it into business insights.

The DLP API, which was released in beta in March, uses machine learning to help organizations quickly identify and secure more than 50 different types of sensitive data, including personally identifiable information (PII) such as credit card numbers, names, and national ID numbers, according to a blog post from Google product manager Scott Ellis.

The new features will allow users to add data de-identification abilities to the DLP API. This will enable enterprises to work with sensitive information while making it harder to associate that information with an individual, therefore reducing the risk of compromising that data, Ellis wrote in the post.

The DLP API helps organizations enforce "need-to-know" access policies in production applications and data workflows, Ellis wrote. It can also be used to interpret virtually any data set or storage system, Ellis wrote. Native support and scale for examining large data sets in Google Cloud Storage, Datastore, and BigQuery is built in as well.

SEE: Hiring kit: User experience specialist (Tech Pro Research)

"Google Cloud DLP API enables our security solutions to scan and classify documents and images from multiple cloud data stores and email sources," Sateesh Narahari, vice president of products, managed methods, wrote in the post. "This allows us to offer our customers critical security features, such as classification and redaction, which are important for managing data and mitigating risk."

The DLP API will support the following de-identification tools:

  • Redaction and suppression

This feature removes entire records or values from a dataset, Ellis wrote. He offered the example of a support agent working in a customer support UI, who doesn't need any identifying details to solve the problem. In that case, a company could decide to redact that information with this tool.

For example, if a customer request said "Hello, this is Samantha Robertson. My order failed. Do you need to validate my SSN 123-45-6789? If you need to contact me, please call 858-222-333 or email," the API redact tool could make it appear as "Hello, this is *** ***. My order failed. Do you need to validate my ***? If you need to contact me, please call *** or email ***."

  • Partial masking

Partial masking hides part of a sensitive piece of data, such as the last seven digits of a US telephone number, Ellis wrote. For example, with partial masking, you could put in "This is my phone number: 858-651-4765," and then see "This is my phone number, 858-###-####," according to the post.

  • Tokenization/secure hashing

Tokenization, also known as secure hashing, is "an algorithmic transformation that replaces a direct identifier with a pseudonym or token," Ellis wrote. Enterprises could tap this when they need to join data or retain a record identifier, but don't want the sensitive underlying elements displayed. The DLP API supports format-preserving encryption, which provides a token of the same length and character set (for example, changing the numbers in a phone number). It also supports secure, key-based hashes, which are tokens made of a 32-byte hexadecimal string created with a data encryption key (for example, it might change the phone number 858-651-4765" to "ga+32mx32s2as8cw38AEfknsFthc").

  • Dynamic data masking

Dynamic data masking (DDM) refers to the DLP API's ability to apply several different de-identification and masking techniques in real time. Enterprises could use this if they don't want to alter any underlying data, but do want to mask it when it is viewed by employees. Ellis offers the example of a company masking data when it is presented in a UI, but requiring special access privileges if someone needs to view the underlying PII.

Image: Google

The DLP API also offers tools such as bucketing, K-anonymity, and L-Diversity to help businesses better gain insights from their data.

As ZDNet noted when the API was released in March, this tool and its new features will be especially useful in highly regulated industries such as healthcare and finance.

Business users can get started with the DLP API beta today with these guides.

The 3 big takeaways for TechRepublic readers

1. New features for Google's Data Loss Prevention (DLP) API will give enterprise users a better way to run analytics on sensitive data in the cloud and turn it into business insights.

2. The new feature will allow users to add data de-identification abilities to the DLP API, including redaction, masking, and tokenization.

3. The DLP API includes native support and scale for scanning large data sets in Google Cloud Storage, Datastore, and BigQuery.

Image: iStockphoto/phototechno

Also see