PROS

  • Simple browser interface
  • Ready-to-run virtual appliance
  • VMs can be analysed when powered off
  • Automatic tagging of sensitive data
  • Can identify dormant data
  • Option to trigger snapshots in response to high data change rates
  • Search snapshots and recover files direct into a VM

CONS

  • VMware specific, can’t be used with other hypervisors
  • Significant storage overheads
  • Currently only available in North America and Canada
  • Standard licence is expensive

Price
From $83 per VM/month


DataGravity for Virtualization (DGfV for short) is one of those products you don’t realise you might need until you try it, after which you’ll find all kinds of uses. It’s also a remarkably easy tool to get to grips with, essentially adding value around finding out what’s in the unstructured data stored on your IT systems — which means just about everything that’s not held in a database, such as documents, spreadsheets, presentations, videos, scripts and other common files.

As well as discovering and analysing content, DGfV can also be used to put that content into context by working out who created or last worked on individual files, when and where they’re all stored, how many copies there are and so on — crucially, also tracking any changes made over time. These are all things that could, feasibly, be done inside a virtual machine, but DGfV does it across all the VMs in an organisation and uses these insights to generate reports to, for example, identify files containing sensitive information or that fall outside compliance limits.

The Essentials edition costs from $83 per VM/month; opt for the more expensive Standard edition (from $167 per VM/month) and DGfV can also be used to forensically analyse security breaches, as well as prompt for action and invoke automatic snapshots to address a variety of potential issues, from highlighting obsolete data through to recovering from ransomware attacks.

Getting started

As you might guess by the name, DataGravity for Virtualization is designed to be used in a virtual environment on the assumption that most enterprises will have migrated most of their servers and applications onto virtual machines. More specifically, the current implementation is written for VMware and is delivered in the form of a ready-to-deploy virtual appliance along with a set of discovery tools that are automatically installed into VMs to be managed as part of the discovery process.

With support for both NFS and iSCSI datastores plus vCenter integration, DGfV can handle most types of storage, starting out with an initial base analysis performed whenever a VM is added to the DGfV inventory list. This is done using a read-only snapshot and will have a slight impact on VM performance but, after that, only subsequent changes are analysed. Also, the frequency of analysis can be customised and a VM doesn’t need to be powered on for an analysis to run.

The DGfV appliance needs to be equipped with at least four virtual cores, 16GB of RAM and some storage space of its own to hold all the analyses. We weren’t able to check out the storage overhead during the review process but, according to the developers it’s around 10-15 percent, which is significant. However, this could easily be recovered as a result of using the product to identify duplicate and obsolete files.

Because it’s designed to work across busy VMs in a large data centre environment, evaluating this kind of product is far from easy, so for our review DataGravity provided us with remote access into its lab already setup with the product and an array of virtual machines to work with. The company also supplied a comprehensive set of guides and common use scenarios to work through. Here are a few to give you a flavour of what DGfV can do.

Discovering sensitive data

The first scenario we looked at was all about finding and managing sensitive data. Things like social security numbers, credit card details, internal product codes and suchlike that you don’t want to end up in the wrong hands but, nevertheless, can proliferate and turn up in unexpected places. In fact, according to DataGravity, they have yet to find a customer who, having signed up for a product trial, doesn’t find large amounts of sensitive material lurking otherwise unseen on their servers.

To mitigate against this, DGfV is capable of indexing over 600 file types and marking files using a set of predefined Tags identifying the presence of common text and number patterns such as credit card numbers, patient ID numbers and so on. These Tags can be edited and added to using a custom wizard and managed together through customisable Insight Profiles as shown here:

From its simple web-based management console DGfV also provides summary insights into all the content it finds across the VMs to be managed. This is shown in the next screenshot, which summarises the content of a selected VM with the Tags found, colour coded and organised into a tile to the right of the display:

As indicated by the red ‘SS’ bubble, 31 hits for social security numbers have been identified, and if we click on this the software drills down to a list of the offending files. This can then be further filtered and examined in detail, with the ability to preview file content where it makes sense, as with the spreadsheet in the example below:

To prevent sensitive data becoming a problem, alerts can also be issued when new data is found somewhere it shouldn’t be and regular reports listing all such files scheduled for delivery. We were also able to identify dormant data not accessed for a specific time period using DGfV, with options to sort and filter on file size, owner, Tags and users who had read or written to the files involved.

The screenshot below lists data not accessed in our test VM for the last 6 months:

Strange behaviour found

Among its more advanced features, the DGfV analysis engine can spot and protect against anomalous behaviour, chiefly by monitoring how frequently files are being changed and triggering a VM snapshot if this exceeds a configurable threshold. Ransomware is becoming a common cause of this kind of high change rate, and the snapshot will make it easier to recover from such an attack by taking a snapshot before too many files are maliciously encrypted.

Similarly, should a user attempt to steal data this can be flagged up by looking for excessive reads with the option to monitor individual user activity as part of a wider forensic investigation. DGfV also builds catalogs listing the files changed in each snapshot, along with details on who made the changes and when, integrating with Windows AD to establish user identities and the guest OS to provide operator-friendly file, directory and volume names.

Restore capabilities to recover individual files from a snapshot directly into a running VM are also available, although to get this and the other advanced features you’ll need a Standard edition licence, which seems a little expensive at $167 per VM/month compared to $83 for the Essentials edition. Volume discounts, however, are available for both.

Although we only had a limited amount of test data to work with, we found DataGravity for Virtualization pretty slick and easy to understand and see no reason why it shouldn’t scale to handle large volumes of data across hundreds of VMs. Moreover, the insights it provides are of undoubted value, particularly in mid-range businesses wanting to better understand and protect their unstructured data. Additional tools to directly manage data and take action based on DGfV reports would be good, but even without these the product has a lot to offer. Plus it’s a relatively new kid on the block, only released in January, and can only get better as more customers start to use it and provide feedback on possible enhancements.

Read more