Date Added: Jan 2012
Clustering similar items for web text has become increasingly important in many Web and Information Retrieval applications. For several kinds of web text data, it is much easier to obtain some external information other than textual features which can be utilized to improve the performance of clustering analysis. This external information, called prior information, indicates label sign and pair wise constraints on sample points. The authors propose a unifying framework that can incorporate prior information of cluster membership for web text cluster analysis and develop a novel semi-supervised clustering model. The proposed framework offers several advantages over existing semi-supervised approaches.