Windows optimize

Windows Server 8 data deduplication: What you need to know

Data deduplication with Windows Server 8 provides storage space savings for Windows Servers running NTFS volumes. Read an overview of this feature.

Many of the new features in Windows Server 8 will deliver immediate benefits to server administrators. One of those features that I learned about at the Windows Server 8 Reviewers Workshop last week is a deduplication engine that can be enabled for NTFS volumes. Deduplication can be done in various ways and, in the case of the Windows Servers implementation, it is a post-process deduplication that is enabled per volume and can be done through PowerShell.

The basic construct of Windows deduplication is that it is a variable chunk that ranges between 32 KB and 128 KB. Chunks that are a duplicate are copied to a chunk store that is managed by Windows and kept in the System Volume Information section of the disk; this means we can't see the inner workings of the deduplication engine. Figure A shows Windows being applied to two files. Figure A

In Figure A, data blocks A, B, and C are deduplication candidates. When the deduplication engine runs, an eligible file has its deduplication blocks copied into the chunk store. From there, the file has two dimensions of its data: a spare and a reparse region. The reparse regions call to the chunk store to access the common chunks or deduplicated data.

The deduplication processes are run via Windows scheduled tasks or can be run interactively via PowerShell. The Get-DedupStatus command will quickly show a percentage of deduplication on a single volume. These scheduled tasks will scour the volume for the deduplication candidates and then coordinate movement into the chunk store. Windows Server 8 data deduplication is not allowed to run on C:\ drives. When I spoke with Microsoft product managers about the reasoning for this restriction, I was told that it was for experience of the system to be kept high. This is because any deduplication engine has overhead, no matter what any vendor says about it.

My observations on the deduplication process for Windows:

  • If a file is fully deduplicated, it will only consume 4.00 KB on disk. This is only for the metadata sections of the files, and all of the reparse (and no parse) regions would be in the chunk store.
  • When you click a file (even those not on a deduplicated volume) in Windows Explorer, the size and size on disk values may have some new behavior. These two values will show the fully hydrated size of the file and the deduplicated size consumption (outside of the chunk store).

Does the Windows Server 8 deduplication feature interest you? Share your comments.

About

Rick Vanover is a software strategy specialist for Veeam Software, based in Columbus, Ohio. Rick has years of IT experience and focuses on virtualization, Windows-based server administration, and system hardware.

22 comments
Rottman3D
Rottman3D

Switch windows to ZFS. So nice to work with and so easy to manage.

simon.laird
simon.laird

Thomas (above) is correct, the diagram is wrong. Before de-dup, File 1 contains A,B,C,M and N. After De-dup, File 1 contains A,B,C,X and Y. The same error has been made on File 2. I hope this isn't a reflection on how well the new feature will work :) Other than the diagram, I'm be a bit scared to use this kind of feature until it has been tried and tested. I manage a few terabytes of user data and if it's all de-duped and something goes wrong then I'm looking at a long time to get it all back from tape. Also, I didn't think Microsoft regarded disk space as a premium these days, so who exactly is this feature aimed at?

thomas.schedl
thomas.schedl

If I have understood it well from reading there is an error in Figure A: File 1 Spares schould point to Block M and N in the chunk store File 2 Spares schould point to Block X and Y in the chunk store

zyzygy
zyzygy

One app that could use this is exchange 2010. It now keeps one copy per mailbox of an email that was sent to multiple mailboxes. Previous versions kept only one copy. Thus the exchange mail store size will explode under 2010.

Joanne Lowery
Joanne Lowery

Currently malware can have a field day with the System Volume store area. I imagine the dedup Metadata file must run a DB engine of sorts to maintain the link between files and locations. If malware hits the Sys Vol information would that write off the Metadata DB? If so is there some mitigation tool / remediation tool (apart from backups) that will fix the links?

Snuffy.
Snuffy.

Why does this bring back bad memories of Dos 6.0 and DoubleSpace? A lot of people lost disk volumes to that poorly designed and buggy disk compression product. I hope history isn't repeating itself... - Snuffy -

Gabrics
Gabrics

Maybe they are thinking about the VMs/cloud? In that case this could be very usefull. And much better than the present option available, the differential VHD as the file itself would stay independent. Like running 100x Win2008r2 would be 1/100 of it size and that is a lot on an SSD setup. Lot of similar sectors. Would be interesting to know how to set this thing up as if you need 100% space first than it is not that great :) Just my 2c anyway.

pgit
pgit

I imagine you'd have to have a lot of data before this would be worth the risk. And I do see the potential for disaster with this. One thing I'd want to know is what implications this has for backups. Are you regularly backing up global system volume information? I can see where this would be helpful if you're maintaining mirrors for fail over or availability, or have vast swaths of data chewing up a lot of wattage to maintain it, i.e. enterprise situations. I think this is an interesting feature and a worthy effort for those who can benefit from it. I just don't see any of my clients needing this, most have a few hundred GB of data or less.

frank.domnick
frank.domnick

I think there are two very essential questions: #1: How much does it slow down the server? #2: How reliable is it? If answer #1 does tend toward 0% or answer #2 does not tend toward 100% then I'll keep my hands - and my servers - off of it.

dechiarag
dechiarag

What software deduplication uses .... Microsoft bought some software, or integrated third-party solution .... Microsoft has never made deduplication, so how this new feature will work?

pivert
pivert

That's number 1 on the "how to slowdown your server" list. If it's not in your storagebox's hardware, it will cause problems. As some netapp clients :-)

north face store
north face store

Men love from overlooking while women love from looking up. the north face outlet If love isa mountain, then if men go up, more women they will see while womenwill see fewer men.???

b4real
b4real

I'll find out how defrag will work on dedupe'd NTFS volumes.

Neon Samurai
Neon Samurai

From my reading so far, deduplication and defragmentation do not mix well and ntfs does no defrag on it's own. Is there any word on how Win defrag and Win dedup will cooperate and how third party proper defrag will integrate?

pgit
pgit

...which is why I could see using this in a large enterprise scenario. Multiple disks per volume, RAID, live mirrors for fail over, tons and tons of data... in that scenario a 2% improvement in disk efficiency starts to add up; quicker seek/read/write times, less to mirror and back up, etc etc.

b4real
b4real

I don't think Microsoft would recommend it, just yet. Hard to put a vibe on it, but with the dedupe not being real time, I'd be a little weary of it. Want to benchmark first.

xire
xire

I can see this being useful in a home situation as well. I have a crazy amount of pictures and pretty lazy attempt at a backup solution. I'd be willing to bet it would save me 100GBs of the TB worth of data I have.

bulk
bulk

for quite a few releases. It's known as "single-Instance Store" and was, I believe, around in W2K3 server times. As far as I know, it's not a bought-in or third-party solution. It only ever appeared as "Windows Storage Server" and was sold as an OEM to hardware vendors offering NAS-like storage boxes. With Windows Server 2008R2 it's just an extra DVD containing the feature (or role, I don't remember) that you install after a normal server installation. I'm running it on my main data server, which has duplicated music, photo and image files and it seems to work well. I've simply "forgotten it's there". RS

ChrisHyche@AlabamaOne.Org
ChrisHyche@AlabamaOne.Org

Windows Home Server 1 has dedup. Though it is just in the backup system and not a whole file system.

pgit
pgit

I do tweak some systems for musicians that have many TB of data, I suppose I could consider them a "home" type system, they're just playing with it because they can. I'm pretty sure a few folks I know have a substantial enough data set to warrant looking into this. One problem off the top of my head is the people I can think of that have a ton of music, movies or what have you, are running Linux :p I will dork with this in the lab and at least get myself familiar.

SmartyParts
SmartyParts

Block-level Deduplication is far more efficient. We see about an 18% dedup level on SIS stores at our office. Testing with Block Level Dedup on a VMware storage NAS is hititng 90%+. Note this testing was with OpenDeDup, not the windows solution, but I feel the results should be similar.