Ohio State University
As datasets are increasing in size, the data management and processing needs are being met with added parallelism, i.e., by involving more nodes and/or cores in the system. This, in turn, is increasing the chances of failures during processing. In this paper, the authors present the design and implementation of a fault-tolerant environment for processing queries on large scientific dataset. Their systems meet the following three requirements that they consider essential for any such environment: high efficiency of execution of a particular data analysis task or query, when there are no failures, ability to handle failure of up to a certain number of nodes, and only a modest slowdown in processing times of data analysis task or a query when there are failures.