Date Added: Jan 2011
Two standard algorithms for data clustering are Expectation Maximization (EM) and K-means. The authors run these algorithms on various data sets to evaluate how well they work. For high dimensional data they use random projection and Principal Components Analysis (PCA) to reduce the dimensionality. The K-Means algorithm finds k clusters by choosing k data points at random as initial cluster centers. Each data point is then assigned to the cluster with center that is closest to that point. Each cluster center is then replaced by the mean of all the data points that have been assigned to that cluster. This process is iterated until no data point is reassigned to a different cluster.