Xiamen Sharelink Technology Co.,LTD
Imbalanced data distribution still remains an unsolved problem in data mining and machine learning. This paper introduces the problem of the class-imbalanced data in classification learning and naturally introduces it into the clustering learning since data clustering is an important and frequently used unsupervised learning method. In this paper, two verification methods based on two different aspects of original data are proposed to test and verify the influence of class-imbalanced data on clustering. Furthermore, the authors also conduct some experiments on different imbalanced-ratios to exploring its importance in clustering algorithm since is a very important factor for the performance in classification learning.