Graph Based Text Document Clustering by Detecting Initial Centroids for K-Means
Document clustering is used in information retrieval to organize a large collection of text documents into some meaningful clusters. k-means clustering algorithm of pratitional category, performs well on document clustering. k-means organizes a large collection of items into k clusters so that a criterion function is optimized. As it is sensitive to the initial values of cluster centroids, this paper proposes a graph based method to calculate the appropriate initial cluster centroids. Document collection is represented as a graphical network in which a node represents a document and an edge represents the similarity between two documents. In order to calculate initial centroids, community structure present in graphical network is detected using edge deletion technique. Using community structure, centrality of each node is calculated.