Block-Ranking: Content Similarity Retrieval Based on Data Partition in Network Storage Environment

Provided by: AICIT
Topic: Cloud
Format: PDF
Now-a-days, data partition plays an important role in eliminating duplicate data in green storage and cloud storage system. Fixed-sized chunking and content based chunking are two kinds of commonly used partition methods to break a file into a sequence of blocks. Meanwhile, inverted index has become the standard indexing method in modern information retrieval field. For conveniently analyzing, the inverted index is represented as inverted matrix and a MapReduce strategy is used to decompose a complex matrix computation into a set of smaller parallelizable sub-matrix computations.

Find By Topic