A Fast Dual-Level Fingerprinting Scheme for Data Deduplication
Data deduplication has attracted recent interest in the research community. Several approaches are proposed to eliminate duplicate data first at the file level and then at the chunk level to reduce the duplicate-lookup complexity. To meet the high-throughput requirements, this paper proposes a Fast Dual-level Fingerprinting (FDF) scheme that can fingerprint a dataset both at the file level and at the chunk level in a single scan of the contents. FDF breaks the fingerprinting process into task segments and further leverage the computing resources of modern multi-core CPUs to pipeline the time-consuming operations.