The Limitation of MapReduce: A Probing Case and a Lightweight Solution
MapReduce is arguably the most successful parallelization framework especially for processing large data sets in datacenters comprising commodity computers. However, difficulties are observed in porting sophisticated applications to MapReduce, albeit the existence of numerous parallelization opportunities. Intrinsically, the MapReduce design allows a program to scale up to handle extremely large data sets, but constrains a program's ability to process smaller data items and exploit variable-degrees of parallelization opportunities which are likely to be the common case in general application. In this paper, the authors analyze the limitations of MapReduce and present the design and implementation of a new lightweight parallelization framework, MRlite.