Data Management

Pairwise Constrained Clustering for Sparse and High Dimensional Feature Spaces

Free registration required

Executive Summary

Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To address this challenge, the authors propose two novel semi-supervised clustering methods that incorporate prior knowledge in the form of pairwise cluster membership constraints. In particular, they project high-dimensional data onto a much reduced-dimension subspace, where rough clustering structure defined by the prior knowledge is strengthened. Metric learning is then performed on the subspace to construct more informative pairwise distances.

  • Format: PDF
  • Size: 237.6 KB