Implementation of an Entropy Weighted K-Means Algorithm for High Dimensional Sparse Data

In this paper, the authors contain a partitional based algorithm for clustering high-dimensional objects in subspaces for iris gene dataset. In high dimensional data, clusters of objects often exist in subspaces rather than in the entire space. This is the data sparsity problem faced in clustering high-dimensional data. In the proposed algorithm, they extend the K-Means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters.

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Resource Details

Provided by:
Creative Commons
Topic:
Data Management
Format:
PDF