Provided by: AICIT
Topic: Big Data
Existing data routing schemes developed for deduplication clusters have never addressed the data read performance, although it has been a well-known problem that the reads require non-trivial random disk seeks significantly affecting the data read performance in deduplication systems. In this paper, the authors propose SORT, a Similarity-Ownership based Routing scheme that exploits both the data similarity and ownership to improve the data read performance for deduplication clusters. Their experimental results fed with real-world datasets show that SORT reduces about 10% of random disk seeks while at the cost of only 0.11% of deduplication efficiency, achieving an optimal tradeoff between the deduplication efficiency and data read performance compared to other existing routing schemes.