Data Management

A Fault-Tolerant Environment for Large-Scale Query Processing

Download Now Free registration required

Executive Summary

As datasets are increasing in size, the data management and processing needs are being met with added parallelism, i.e., by involving more nodes and/or cores in the system. This, in turn, is increasing the chances of failures during processing. In this paper, the authors present the design and implementation of a fault-tolerant environment for processing queries on large scientific dataset. Their systems meet the following three requirements that they consider essential for any such environment: high efficiency of execution of a particular data analysis task or query, when there are no failures, ability to handle failure of up to a certain number of nodes, and only a modest slowdown in processing times of data analysis task or a query when there are failures.

  • Format: PDF
  • Size: 515.7 KB