A Novel Approach to Data Deduplication Over the Engineering-Oriented Cloud Systems
This paper presents a duplication-less storage system over the engineering-oriented cloud computing platforms. The authors' deduplication storage system, which manages data and duplication over the cloud system, consists of two major components, a front-end deduplication application and a mass storage system as back-end. Hadoop Distributed File System (HDFS) is a common distribution file system on the cloud, which is used with Hadoop dataBase (HBase). The authors use HDFS to build up a mass storage system and employ HBase to build up a fast indexing system. With a deduplication application, a scalable and parallel deduplicated cloud storage system can be effectively built up. They further use VMware to generate a simulated cloud environment.