Boosting the Efficiency in Similarity Search on Signature Collections

Computing all signature pairs whose bit differences are less than or equal to a given threshold in large signature collections is an important problem in many applications. In this paper, the authors leverage MapReduce-based parallelization in order to enable scalable similarity search on the signatures. A road-block in using MapReduce framework in this problem, however, is that the cost of merging and sorting intermediate key-value pairs produced by multiple mappers can be prohibitively expensive when they do not fit into the main memory.

Provided by: International Journal of Emerging Technology and Advanced Engineering (IJETAE) Topic: Data Management Date Added: May 2013 Format: PDF

Download Now

Find By Topic