Windows Server 8 deduplication use cases and caveats

In this Windows Server 8 tip, Rick Vanover explains what you should consider before implementing native deduplication on Windows storage.

During a recent Windows Server 8 Reviewers Workshop, I asked Microsoft product teams a number of questions about how Windows Server 8 deduplication should be implemented. Deduplication is not supported on C:\ drives for experience reasons.

With that information, my thoughts started to wander. Specifically, for Hyper-V we now have the option to use deduplication at the Hyper-V host side for the volume that contains the guest virtual machines (VMs). We also can use the deduplication feature within Hyper-V guest VMs (for virtual drives other than the C:\ drive). Further, we could deduplicate both on the Hyper-V host volume and on the guest virtual VM. The way to go about applying deduplication for Windows Server 8 depends, and deduplicating VMs may not be the best strategy at first glance.

In terms of a blanket recommendation, the deduplication implementation provided by Windows Server 8 is intended for file content. So, while I was shown some great examples of deduplication with .VHD files, they may not be the best real-world deduplication use cases. If a file was to be accessed constantly and changed internally, there may be a lot of rehydration from the chunk store for that type of file. Obviously, a .VHD file for a Hyper-V VM will undergo a lot of changes and may not be the best use case.

The Windows team put file server data volumes at the top of the list for deduplication volumes, and then the question was raised: Should we run deduplication against SQL and Exchange volumes? Like VMs, the answer to this question was complicated. The data profiles of Exchange and SQL are very structured; in addition, each application might have efficiencies built-in that may decrease the use case of deduplication on the volume provided by Windows. The best example is the big email attachment sent to all recipients; Exchange databases will single-instance that data region.

PowerShell is such a common theme through Windows Server 8 that I think anything can be done with PowerShell, including volume deduplication.

One glaring omission with Windows Server 8's deduplication is that it cannot be configured in Group Policy. While admins could manage the scripts to enable deduplication on a volume through Group Policy, it's not a true Group Policy solution. Further, we could assign scheduled tasks via Group Policy to manage the three Windows services that manage the deduplication engine: Background optimization, Weekly garbage collection, and Weekly scrubbing. These processes collectively look for deduplication, as well as remove chunks from the chunk store of files that have been removed from the volume.

Given these caveats, does deduplication on Windows volumes appeal to you? For file servers, I'd probably go for it initially, but I'm not sure if I would for many other volume types. How about you? Share your comments below.