International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)
In this paper, the authors propose an effective and efficient algorithm for clustering text documents. This algorithm is formulated by using the concept of well-known k-means algorithm. The standard k-means algorithm suffers from the problem of random initialization of initial cluster centers. The proposed algorithm eliminates this problem by introducing a new approach for selection of initial cluster centroids. Several experiments are conducted on mini-news group dataset to measure the performance of proposed algorithm and the results obtained are very promising when compared to two other algorithms: k-means and enhanced k-means.