A Mutually Supervised Ensemble Approach for Clustering Heterogeneous Datasets

The authors present an algorithm to address the problem of clustering two contextually related heterogeneous datasets that use different feature sets, but consist of non-disjoint sets of objects. The method is based on clustering the datasets individually and then combining the resulting clusters. The algorithm iteratively refines the two sets of clusters using a mutually supervised approach to maximize their mutual entropy and finally computes a single set of clusters. They applied their algorithm on a document collection using multiple feature sets that were extracted by natural language preprocessing methods.

Provided by: Fairmont State University Topic: Data Management Date Added: Jul 2011 Format: PDF

Find By Topic