Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality

Free registration required

Executive Summary

The authors present sparse indexing, a technique that uses sampling and exploits the inherent locality within backup streams to solve for large-scale backup (e.g., hundreds of terabytes) the chunk-lookup disk bottleneck problem that inline, chunk-based deduplication schemes face. The problem is that these schemes traditionally require a full chunk index, which indexes every chunk, in order to determine which chunks have already been stored; unfortunately, at scale it is impractical to keep such an index in RAM and a disk-based index with one seek per incoming chunk is far too slow.

  • Format: PDF
  • Size: 569.8 KB