Big Data

A Semantic Approach for Document Clustering

Date Added: Jul 2009
Format: PDF

Conventional document mining systems mainly use the presence or absence of keywords to mine texts. However, simple word counting and frequency distributions of term appearances do not capture the meaning behind the words, which results in limiting the ability to mine the texts. In this paper, the application of a semantic understanding-based approach to mine documents is presented. The approach is based on semantic notions to represent text, and to measure similarity between text documents. The representation scheme reflects existing relations between concepts and facilitates accurate similarity measurements that result in better mining performance. A document mining process, namely semantic document clustering, is investigated and tackled in various ways.