On the Feasibility of Byzantine Fault-Tolerant MapReduce in Clouds-of-Clouds
MapReduce is a framework for processing large data sets largely used in cloud computing. MapReduce implementations like Hadoop can tolerate crashes and file corruptions, but there is evidence that general arbitrary faults do occur and can affect the correctness of job executions. Furthermore, many individual cloud outages have been reported, raising concerns about depending on a single cloud. The authors present a MapReduce runtime that tolerates arbitrary faults and runs in a set of clouds at a reasonable cost in terms of computation and execution time. The main challenge is to avoid sending through the internet the huge amount of data that would normally be exchanged between map and reduce tasks.