Semantic-based Conversation Clustering in Large Text Database

Download Now
Provided by: Binary Information Press
Topic: Data Management
Format: PDF
In this paper, the authors focuses on the problems of high dimension and sparsity in conversation (sets of Instant Massages correlated through source and destination address) clustering and proposes a new algorithm, called FI-KMeans, based on k-means and top-k frequent term sets. In addition, a method, which captures term mutual information and context in conversations, and an algorithm, which measures the similarity between conversations, are proposed to support FI-KMeans. The experiments show that FI-KMeans has higher precision than standard k-means and bisection k-means towards big volume and highly sparse conversations, while still maintaining good scalability.
Download Now

Find By Topic