Chinese University of Hong Kong
De-duplication is known to effectively eliminate duplicates, yet it introduces fragmentation that degrades read performance. The authors propose RevDedup, a de-duplication system that optimizes reads to the latest backups of Virtual Machine (VM) images using reverse de-duplication. In contrast with conventional de-duplication that removes duplicates from new data, RevDedup removes duplicates from old data, thereby shifting fragmentation to old data while keeping the layout of new data as sequential as possible. They evaluate their RevDedup prototype using a 12-week span of real-world VM image snapshots of 160 users. They show that RevDedup achieves high de-duplication efficiency, high backup throughput, and high read throughput.