Windows

What can Windows Single Instance Storage do for you?


In Windows Storage Server 2003 R2, Microsoft introduced Windows Single Instance Storage (SIS). For years, Microsoft's Exchange Server has used SIS to provide more efficient storage of mail to the Exchange environment. For example, under Exchange's SIS implementation, when a user sends a message out to 100 people with mailboxes in the same store, the message is stored only once. A pointer to this stored message is inserted into each user's mailbox, thereby removing the need to store the entire message multiple times. This process can result in significantly lower storage requirements. Microsoft has also used SIS in Microsoft Windows 2000 Remote Installation Services and is expanding its use in other products.

Single Instance Storage will also be included in Windows Server 2008, but only in the Storage edition. The feature will not be made available in other editions.

How does SIS work in Windows?

In Exchange, SIS is message-based. As I mentioned, pointers are used to direct requests for a message to the original copy of the message. With Windows Storage Server, SIS works on a file basis. A process called the SIS Groveler searches through NTFS-based file systems looking for identical files. Once duplicate files are located, they are moved to the SIS Common Store by a component called the SIS Storage Filter. There is a SIS Common Store for each SIS-managed volume. For each file that is moved into the SIS Common Store, a SIS Link is inserted into the file system (also by the SIS Storage Filter) in place of the original file. This SIS link is completely transparent to applications that may be accessing the file, which is actually located in the SIS Common Store. The SIS Storage Filter also handles client redirection to the version of the file stored in the SIS Common Store.

The links placed into the file system as sparse files of the same listed size as the original, but with no disk space actually allocated. Inside the SIS link is information known as a reparse point, which contains information, including the name of the original file and a unique identifier for the link.

When a SIS-managed file is modified and a user saves the file, the new file is written to the file system and not into the SIS Common Store. Other users accessing the file continue to be served by the original SIS-housed version. By the way, identical files in different locations maintain access rights of their original location.

SIS can monitor up to six separate NTFS volumes. For maximum benefit, if you have more than six volumes on a server to be managed with SIS, you should choose volumes that have the best chance for duplicates. Microsoft estimates that SIS can reduce storage by 25 to 40 percent.

If an administrator disables the SIS Storage Filter service, it will result in access to SIS-based files being disabled.

Summary

The concept of SIS isn't entirely new, but it's finding its way into more and more products. With Windows Storage Server, Microsoft is implementing it at the file level and, in Exchange, at the message level. However, it doesn't stop there. With Windows Home Server, for example, Microsoft makes backups more efficient though a SIS-like technology that takes place at the block level. As SIS continues to grow, maybe the insatiable need for more and more disk space will level off a bit?

About

Since 1994, Scott Lowe has been providing technology solutions to a variety of organizations. After spending 10 years in multiple CIO roles, Scott is now an independent consultant, blogger, author, owner of The 1610 Group, and a Senior IT Executive w...

4 comments
perezjonestsisah
perezjonestsisah

Yeah would like to see more implementation of SIS in other products where applicable.

saul_rodriguez
saul_rodriguez

Actually, SIS first became available for NTFS in Windows Storage Server 2003 R2, which is what I have. I haven't implemented it yet because I don't have a clear understanding of it's compatibility with backup utilities (eg. ARCserver, Backup Exec) and posible gotchas on backup/restore of volumes with SIS enabled. For example, what can happen when restoring a file with a different timestamp of one that exists in the SIS storage. Or what happens when restoring a file to a user directory that its identical 'to one in the SIS storage.

pete.caviness
pete.caviness

The core SIS engine was released as part of the Windows 2000 operating system. It was focused solely on Remote Installation Services so few administrators have a first hand knowledge of the feature. SIS uses standard features of NTFS so every backup vendor and anti-virus vendor should work with reparse-points and sparse files. If you see problems I would ask the backup vendor why it has taken 8 years to work out the bugs. If you look through the GUI of every backup product you will find support for "Remote Storage". Remote Storage also uses reparse-points and abstracts the true location of the file data. If you tell your backup application to protect data in Remote Storage it will backup the real data. Otherwise it will only protect the reparse-point. On a restore the reparse-points need to have access to the backend data. The SIS backed data is also protected during the backup by the backup application.

WayneAndersen
WayneAndersen

I'm concerned about this sentance: "If an administrator disables the SIS Storage Filter service, it will result in access to SIS-based files being disabled." What are the chances of this happening? Can this happen inadvertantly thereby depriving users of access to their files? Does this work okay with third party systems, such as backup systems? What happens if you have to restore a SIS-based file? What happens if the file is backed up, one users changes his version of the file, and then the file is restored? Which version of the file is restored?

Editor's Picks