Bayesian Locality Sensitive Hashing for Fast Similarity Search

Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks the authors to find all pairs of objects with similarity greater than a certain user-specified threshold. Locality-Sensitive Hashing (LSH) based methods have become a very popular approach for this problem. However, most such methods only use LSH for the first phase of similarity search - i.e. efficient indexing for candidate generation. In this paper, they present BayesLSH, a principled bayesian algorithm for the subsequent phase of similarity search - performing candidate pruning and similarity estimation using LSH.

Provided by: VLD Digital Topic: Big Data Date Added: Aug 2012 Format: PDF

Find By Topic