Business Intelligence

An Effective Web Document Clustering for Information Retrieval

Free registration required

Executive Summary

The size of web has increased exponentially over the past few years with thousands of documents related to a subject available to the user. In this paper, the authors introduce a combine approach to cluster the web pages which first finds the frequent sets and then clusters the documents. These frequent sets are generated by using Frequent Pattern growth technique. Then by applying Fuzzy CMeans algorithm on it, they found clusters having documents which are highly related and have similar features. They used Gensim package to implement their approach because of its simplicity and robust nature. They have compared their results with the combine approach of (Frequent Pattern growth, K-means) and (Frequent Pattern growth, Cosine-Similarity).

  • Format: PDF
  • Size: 388.1 KB