Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality
Source: University of California
The authors present sparse indexing, a technique that uses sampling and exploits the inherent locality within backup streams to solve for large-scale backup (e.g., hundreds of terabytes) the chunk-lookup disk bottleneck problem that inline, chunk-based deduplication schemes face. The problem is that these schemes traditionally require a full chunk index, which indexes every chunk, in order to determine which chunks have already been stored; unfortunately, at scale it is impractical to keep such an index in RAM and a disk-based index with one seek per incoming chunk is far too slow.
| Format: | Size: | 569.80 | |
| Date: | Jan 2009 |



