Mini-Batch K-Means Clustering Using Map-Reduce in Hadoop

Provided by: Creative Commons
Topic: Data Management
Format: PDF
In this paper, the authors describe an approach for data clustering by using mini batch K-Means algorithm. The implementation describes here optimizes the K-means by using one-pass over the input data and produces as many centroids as it determines is optimal. Avoiding multiple passes over the input data can have major impacts on running time because just reading large data set can increase the cost in large-scale computations. Mini batch K-means algorithm is implemented by using Hadoop framework.

Find By Topic