Sum of Distance Based Algorithm for Clustering Web Data
Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. The criterion for checking the similarity is implementation dependent. Clustering analyzes data objects without consulting a known class label or category i.e. it is an unsupervised data mining technique. K-means is a widely used clustering algorithm that chooses random cluster centers (centroid), one for each centroid. The performance of k-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones as the algorithm can converge to local optimal solutions.