HANDS: A Heuristically Arranged Non-Backup In-Line Deduplication System
Deduplication on is rarely used on primary storage because of the disk bottleneck problem, which results from the need to keep an index mapping chunks of data to hash values in memory in order to detect duplicate blocks. This index grows with the number of unique data blocks, creating a scalability problem, and at current prices the cost of additional RAM approaches the cost of the indexed disks. Thus, previously, deduplication ratios had to be over 45% to see any cost benefit. The HANDS technique that the authors introduce in this paper reduces the amount of in-memory index storage required by up to 99% while still achieving between 30% and 90% of the deduplication of a full memory-resident index, making primary deduplication cost effective in workloads with a low deduplication rate.