LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud
This paper investigates the problem of Partitioning Skew in MapReduce-based system. The authors' studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew causes a huge amount of data transfer during the shuffle phase and leads to significant unfairness on the reduce input among different data nodes. As a result, the applications experience performance degradation due to the long data transfer during the shuffle phase along with the computation skew, particularly in reduce phase. They develop a novel algorithm named LEEN for locality-aware and fairness-aware key partitioning in MapReduce.