There are very big bottlenecks when traditional data mining algorithms deal with large data sets. The emergence of cloud computing has solved bottlenecks for massive data storage and computing and made massive data mining becomes possible. The programming framework, MapReduce, hides the underlying implementation details, such as distributed file system, job scheduling and fault tolerance, reducing the difficulty of programming to a large extent. MapReduce has become an important technology in cloud computing. This paper studies the k-means algorithm of clustering methods and with blindness and randomness of selecting the initial cluster centers in k-means algorithm, proposes parallel Canopy k-means algorithm based MapReduce programming model.