Efficient Processing of k Nearest Neighbor Joins using MapReduce
k Nearest Neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the joint operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, the authors investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers.