Clustering Web Documents Based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets

Provided by: SAI Consulting
Topic: Data Management
Format: PDF
Document clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT).

Find By Topic