Research on Distributional Stability of Chunk Sizes in Data Chunking

Provided by: AICIT
Topic: Big Data
Format: PDF
Data chunking algorithm plays an important role in removing repeated data, operating efficiency and distribution of chunk sizes of the algorithm directly affect performance of the whole system. Currently, Two Thresholds Two Divisors algorithm (TTTD) can chunk data according to data content's digital features, making chunk sizes vary with chunks' features, and improve the efficiency of achieving repeated data. However, distribution of chunk sizes achieved by TTTD has a poor stability, distribution of chunk sizes is too diverging and the unbiased variance is large and thus bad for removing repeated data.

Find By Topic