Garbage Collection Auto-Tuning for Java MapReduce on Multi-Cores
MapReduce has been widely accepted as a simple programming pattern that can form the basis for efficient, large-scale, distributed data processing. The success of the MapReduce pattern has led to a variety of implementations for different computational scenarios. In this paper, the authors present MRJ, a MapReduce Java framework for multi-core architectures. They evaluate its scalability on a fourcore, hyperthreaded Intel Core i7 processor, using a set of standard MapReduce benchmarks. They investigate the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. They propose the use of memory management autotuning techniques based on machine learning.