Implementing K-Means Algorithm Using Row Store and Column Store Databases: A Case Study
K-means clustering is an important algorithm for identifying the structure in data. K-means is the simplest clustering algorithm. This algorithm uses as input a predefined number of clusters i.e., the K from its name. Mean stands for an average, an average location of all the members of a particular cluster. In this paper, a novel approach to seeding the clusters with a latent data structure is proposed. This is expected to minimize the need for number of clusters Apriori, time for convergence by providing near optimal cluster centers.