Date Added: Aug 2010
MapReduce job execution typically occurs in sequential phases of parallel steps. These phases can experience unpredictable delays when available computing and network capacities fluctuate or when there are large disparities in inter-node communication delays, as can occur on shared compute clouds. The authors propose a pipeline-based scheduling strategy, called speculative pipelining, which uses speculative prefetching and computing to minimize execution delays in subsequent stages due to varying resource availability. Their proposed method can mask the time required to perform speculative operations by overlapping with other ongoing operations. They introduce the notion of "Open-option" prefetching, which, via coding techniques, allows speculative prefetching to begin even before knowing exactly which input will be needed.