DataGravity's unique take on data loss prevention

DataGravity leverages storage arrays for data loss prevention. Keith Townsend points out that tech is just one aspect of a data center security solution.

Image: iStock

One of the constant frustrations of managing an enterprise data center is protecting sensitive data. Recent breaches have made companies painfully aware that credit card data isn't the only data that needs to be protected; personally identifiable information (PII), trade secrets, and sensitive correspondence have also proven valuable to bad actors.

Data loss protection (DLP) solutions have been on the market for several years; most of these solutions focus either on the endpoint or the network. Recently, there has been an effort to focus on the storage array. Storage vendor DataGravity tackles DLP at the array. In this post, I'll introduce the system and some considerations.

Technology is only part of the solution

Before getting into the specifics of DataGravity, I need to stress the importance of a strong enterprise security policy; DLP technology is an addition to a strong security program. I've implemented host-based DLP and, universally, the most challenging aspect of implementation has been the maturity of the security training and program.

An example is the forced encryption of USB drives. I led the deployment of an endpoint DLP solution in which the solution encrypts all files copied to external media. Users found the solution too intrusive to their existing workflow. So instead of embracing the ease of use for encrypting critical data, users found even less secure ways of transferring data such as file sharing sites. The education of end users is the most important aspect of protecting sensitive data.

DLP should be deployed to help protect against unauthorized access vs. forcing end users into strong data protection habits.

The unstructured data challenge

Sensitive data exists all over the data center and enterprise. The obvious places for sensitive data are within application data. Well-written applications control the flow and access of sensitive data via controls within the application; however, data will escape these walled gardens and appear as unstructured data on disk. Identifying the resulting unstructured sensitive data can be a challenge.

One way to identify and control sensitive data located on the network is to leverage host-based DLP. Products from traditional enterprise security companies such as McAfee and Symantec use centralized management solutions to dictate data policies on local servers and workstations. In theory, endpoint-based DLP inspects all unstructured data that traverses the endpoint; the method is very similar to virus protection, and it has some of the disadvantages of virus protection. The endpoint approach uses significant CPU resources and requires an agent is installed on the endpoint.

DataGravity's unique approach

DataGravity incorporates the metadata needed to track sensitive data on the actual storage array. Identifying and tagging sensitive data is one of the most challenging and compute heavy aspects of data protection. DataGravity shifts the identification burden to the array and gives deeper levels of visibility.

DataGravity can identify sensitive unstructured data using algorithmic patterns such as validated credit card and social security numbers. End users can also define patterns and manually tag sensitive data. Once identified, DataGravity can report or prevent access to PII.

I spoke with DataGravity's CEO Paula Long, who discussed potential use cases for the technology. One of the chief use cases is looking for PII within virtual machine (VM) images that reside in the array. The VM image inspection removes the barrier and performance overhead of running DLP on each VM. Further, sensitive data is tracked regardless of the power status of the VM. If the VM image file moves, DataGravity will track the movement of the associated PII. The recently announced version of DataGravity integrates with VMware vRealize to automate policy enforcement.


DataGravity adds a resource in the fight to control sensitive data, but it isn't a magic bullet for DLP; there's still mail, endpoints, and legacy arrays that need to be considered. Also, there's no solution that gives a single pane of glass or reporting mechanism for DLP from all sources.

Is array-based DLP a solution to a problem you are experiencing, or is it a solution looking for a problem? I'd love to read your thoughts in the comments.

Also see