A Mutually Supervised Ensemble Approach for Clustering Heterogeneous Datasets
The authors present an algorithm to address the problem of clustering two contextually related heterogeneous datasets that use different feature sets, but consist of non-disjoint sets of objects. The method is based on clustering the datasets individually and then combining the resulting clusters. The algorithm iteratively refines the two sets of clusters using a mutually supervised approach to maximize their mutual entropy and finally computes a single set of clusters. They applied their algorithm on a document collection using multiple feature sets that were extracted by natural language preprocessing methods.