RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce

Provided by: RWTH Aachen University
Topic: Data Management
Format: PDF
In this paper, the authors consider the problem of processing K-Nearest Neighbor (KNN) queries over large datasets where the index is jointly maintained by a set of machines in a computing cluster. The proposed RankReduce approach uses Locality Sensitive Hashing (LSH) together with a MapReduce implementation, which by design is a perfect match as the hashing principle of LSH can be smoothly integrated in the mapping phase of MapReduce. The LSH algorithm assigns similar objects to the same fragments in the distributed file system which enables an effective selection of potential candidate neighbors which get then reduced to the set of K-Nearest Neighbors.

Find By Topic