A Scalable Algorithm for Single-Linkage Hierarchical Clustering on Distributed-Memory Architectures
Hierarchical clustering is the problem of discovering the large-scale cluster structure of a dataset by forming a dendrogram that captures a full range of clustering behavior in the dataset, from the most general cluster that encompasses the entire dataset, to the most stringent clusters that only include a single data point each. Hierarchical clustering is a widely-used algorithm for evaluating the cluster structure of a dataset. Hierarchical clustering offers several advantages over partitional clustering in that the number of clusters does not need to be specified in advance and the structure of the resulting dendrogram can offer insight into the larger structure of the data, e.g., establishing a phylogenetic tree among a set of species.
Provided by: Institute of Electrical & Electronic Engineers Topic: Big Data Date Added: Oct 2013 Format: PDF