International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE)
Data is increasing day-by-day with the development of information technology. Extracting the required information from huge amount of data is a complex and time consuming process. Clustering can be considered the most important unsupervised learning in data mining. K-means clustering is a traditional and popular cluster analysis method in data mining but it is not suitable for large volumes of unstructured data sets. Therefore, this paper proposes k-means clustering with Hadoop Map-reducing technique. Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data in-parallel on large clusters.