Binary Information Press
Due to the fast advance of internet technique, users have to deal with a large amount of raw data from World Wide Web (WWW) every day. Since clustering is unsupervised and needs no transcendental knowledge, it gradually becomes an efficient tool to help users analyze information. Unfortunately, almost all the clustering algorithms proposed so far fail to perform text clustering accurately, especially for a large scale text collection. Thus, a novel Probability based Text Clustering algorithm via Integrating Feature set construction and Text partition (abbreviated as PTCIFT) is proposed in this paper.