Automatic Evaluation of Cluster in Unlabeled Datasets

Date Added: Jan 2012
Format: PDF

All clustering algorithms ultimately rely on one or more human inputs, and the most important input is number of clusters (c) to seek. There are "Adaptive" methods which claim to relieve the user from making this most important choice, but these methods ultimately make the choice by thresholding some value in the code. Thus, the choice of 'c' is transferred to the equivalent choice of the hidden threshold that determines 'c' automatically. This work investigates a new technique called Spectral VAT for estimating the number of clusters to look for in unlabeled data utilizing the VAT [Visual Assessment of Cluster Tendency] algorithm, coupled with a Spectral analysis and several common image processing techniques. Several numerical datasets are presented to illustrate the effectiveness of the proposed method.