Efficient Parallel kNN Joins for Large Data in MapReduce

In data mining applications and spatial and multimedia databases, a useful tool is the kNN join, which is to produce the k Nearest Neighbors (NN), from a dataset S, of every point in a dataset R. Since it involves both the join and the NN search, performing kNN joins efficiently is a challenging task. Meanwhile, applications continue to witness a quick (exponential in some cases) increase in the amount of data to be processed. A popular model nowadays for large-scale data processing is the shared-nothing cluster on a number of commodity machines using MapReduce.

Provided by: Association for Computing Machinery Topic: Data Management Date Added: Mar 2012 Format: PDF

Find By Topic