Binary Information Press
MapReduce is a simplified programming model of distributed parallel computing. It is an important technology of Google and is commonly used for data-intensive distributed parallel computing. Cluster analysis is the most important data mining methods. Efficient parallel algorithms and frameworks are the key to meeting the scalability and performance requirements entailed in such scientific data analyses. In order to improve the deficiency of the long time in large-scale data sets clustering on the single computer, the authors design and realize a parallel K-means algorithm based on MapReduce framework.