International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)
Clustering is a process of partitioning a set of data (or objects) into a set of meaningful subclasses called clusters. Hierarchical document clustering organizes clusters into a tree or a hierarchy that facilitates browsing. Document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. In order to browse and organize documents smoothly, hierarchical clustering techniques have been proposed to cluster a collection of documents into a hierarchical tree structure. Despite that, there still exist several challenges for hierarchical document clustering, such as high dimensionality, scalability, and accuracy.