With the rapid development of the Internet, huge volumes of documents need to be processed in a short time. In this paper, the authors describe how document clustering for large collection can be efficiently implemented with MapReduce. Hadoop implementation provides a convenient and flexible framework for distributed computing on a cluster of commodity machines. The design and implementation of tfidf and K-Means algorithm on MapReduce is presented. More importantly, they improved the efficiency and effectiveness of the algorithm. Finally, they give the results and some related discussion.