Data Management Investigate

Unsupervised Learning Algorithms to Identify the Dense Cluster in Large Datasets

Download now Free registration required

Executive Summary

High-dimensional database is having large datasets while solving the cluster identification problem and for identifying dense clusters in a noisy data. The authors' analysis works to identifies clusters through the identification of densely intra connected sub graphs, they have employed a pattern recognition algorithms representation of the graph and solve the cluster identification problem using K-means, K-modes, Single Linkage Clustering. The computational analysis indicate that when running on 150 CPUs, one of their algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner.

  • Format: PDF
  • Size: 354.1 KB