International Journal of Advanced Technology in Engineering and Science (IJATES)
Side information is available along with text document in several texts mining application. They are the different kind of side information such as document provenance information, the link in the document, other non-textual attributes which are contained into the document or user access behavior from web logs. Some attributes may contain extremely large amount of information for clustering purpose. Sometimes clustering is more difficult when some of the information is noisy. To design a combination of classical partitioning algorithm with probabilistic model technique to create an effective clustering approach.