Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Provided by: Binary Information Press
Topic: Data Management
Format: PDF
MapReduce is a simplified programming model of distributed parallel computing. It is an important technology of Google and is commonly used for data-intensive distributed parallel computing. Cluster analysis is the most important data mining methods. Efficient parallel algorithms and frameworks are the key to meeting the scalability and performance requirements entailed in such scientific data analyses. In order to improve the deficiency of the long time in large-scale data sets clustering on the single computer, the authors design and realize a parallel K-means algorithm based on MapReduce framework.

Find By Topic