Implementation of Kea-Keyphrase Extrac-Tion Algorithm by Using Bisecting K-Means Clustering Technique for Large and Dynamic Data Set

Provided by: Creative Commons
Topic: Data Management
Format: PDF
In most traditional techniques of document clustering, the number of total clusters is not known in advance and the cluster that contains the target information cannot be deter-mined since the semantic nature is not associated with the cluster. To solve this problem, this work proposes a new clustering algorithm based on the Kea key phrase extraction algorithm which returns several key phrases from the source documents by using some machine learning techniques. In this paper, documents are grouped into several clusters like Bisecting K-means, but the number of clusters is automatically determined by the algorithm with some heuristics using the extracted key phrases.

Find By Topic