Many of the new features in Windows Server 8 will deliver immediate benefits to server administrators. One of those features that I learned about at the Windows Server 8 Reviewers Workshop last week is a deduplication engine that can be enabled for NTFS volumes. Deduplication can be done in various ways and, in the case of the Windows Servers implementation, it is a post-process deduplication that is enabled per volume and can be done through PowerShell.
The basic construct of Windows deduplication is that it is a variable chunk that ranges between 32 KB and 128 KB. Chunks that are a duplicate are copied to a chunk store that is managed by Windows and kept in the System Volume Information section of the disk; this means we can’t see the inner workings of the deduplication engine. Figure A shows Windows being applied to two files.
In Figure A, data blocks A, B, and C are deduplication candidates. When the deduplication engine runs, an eligible file has its deduplication blocks copied into the chunk store. From there, the file has two dimensions of its data: a spare and a reparse region. The reparse regions call to the chunk store to access the common chunks or deduplicated data.
The deduplication processes are run via Windows scheduled tasks or can be run interactively via PowerShell. The Get-DedupStatus command will quickly show a percentage of deduplication on a single volume. These scheduled tasks will scour the volume for the deduplication candidates and then coordinate movement into the chunk store. Windows Server 8 data deduplication is not allowed to run on C:\ drives. When I spoke with Microsoft product managers about the reasoning for this restriction, I was told that it was for experience of the system to be kept high. This is because any deduplication engine has overhead, no matter what any vendor says about it.
My observations on the deduplication process for Windows:
- If a file is fully deduplicated, it will only consume 4.00 KB on disk. This is only for the metadata sections of the files, and all of the reparse (and no parse) regions would be in the chunk store.
- When you click a file (even those not on a deduplicated volume) in Windows Explorer, the size and size on disk values may have some new behavior. These two values will show the fully hydrated size of the file and the deduplicated size consumption (outside of the chunk store).
Does the Windows Server 8 deduplication feature interest you? Share your comments.