Provided by: International Journal of Computer Science and Mobile Computing (IJCSMC)
Topic: Data Management
Date Added: May 2015
MapReduce has emerged as a popular paradigm for processing large datasets in parallel over a cluster. Hadoop is an open source implementation of MapReduce, which is very attractive for parallel processing of a variety of different applications, e.g., web crawling, log processing, video and image analysis, recommendation systems, etc. Multiple users with various types of workloads share MapReduce cluster. When a group of jobs are simultaneously submitted to a MapReduce cluster, they compete for the pooled resources and the overall system performance in terms of job response times might be degraded.