International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)
The enlarging volumes of information emerging by the progress of technology, makes clustering of big data a challenging task. The K-means clustering algorithm is most commonly used algorithms for clustering analysis. The existing K-means algorithm is, inefficient while working on big data and improving the algorithm remains a problem. K-means algorithm is computationally expensive. The quality of the resulting clusters heavily depends on the selection of initial centroids. The existing k-means algorithm, does not guarantee optimality.