An Enhanced Framework for Performance Optimization of Apache Hadoop

Download Now
Provided by: Creative Commons
Topic: Big Data
Format: PDF
Hadoop is a well-known implementation of the MapReduce framework for running data-transformation jobs on clusters of commodity servers. In Hadoop data transformations jobs are executing in parallel using multiple map and reduce tasks. The main objective of the proposed system is to propose a design improvement in shuffling mechanism used in reduce tasks, so that it will improve the performance of map-reduce jobs significantly in Hadoop cluster. There is delay in job completion to the combining of the shuffle phase and reduce tasks, because of this the parallelism between multiple waves of map and reduce unused, fails to address data distribution skew among reduce tasks, and makes task scheduling inefficient.
Download Now

Find By Topic