Determining the K in K-Means with MapReduce

Provided by: Creative Commons
Topic: Data Management
Format: PDF
In this paper, the authors propose a MapReduce implementation of g-means, a variant of k-means that is able to automatically determine k, the number of clusters. They show that their implementation scales to very large datasets and very large values of k, as the computation cost is proportional to nk. Other techniques that run a clustering algorithm with different values of k and choose the value of k that provides the "Best" results have a computation cost that is proportional to nk2. They run experiments that confirm that the processing time is proportional to k.

Find By Topic