Implementation of an Entropy Weighted K-Means Algorithm for High Dimensional Sparse Data
In this paper, the authors contain a partitional based algorithm for clustering high-dimensional objects in subspaces for iris gene dataset. In high dimensional data, clusters of objects often exist in subspaces rather than in the entire space. This is the data sparsity problem faced in clustering high-dimensional data. In the proposed algorithm, they extend the K-Means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters.