Upper and Lower Bounds on the Cost of a MapReduce Computation
In this paper, the authors study the tradeoff between parallelism and communication cost in a map-reduce computation. For any problem that is not "Embarrassingly parallel," the finer they partition the work of the reducers so that more parallelism can be extracted, the greater will be the total communication between mappers and reducers. They introduce a model of problems that can be solved in a single round of map-reduce computation. This model enables a generic recipe for discovering lower bounds on communication cost as a function of the maximum number of inputs that can be assigned to one reducer.