The International Journal of Innovative Research in Computer and Communication Engineering
Over the past years, large amounts of structured and unstructured data are being collected from various sources. These huge amounts of data are difficult to handle by a single machine which requires the work to be distributed across large number of computers. Hadoop is one such distributed framework which process data in distributed manner by using MapReduce programming model. In order for MapReduce to work, it has to divide the workload across the machines in the cluster.