OTPM: Failure Handling in Data-Intensive Analytical Processing
Parallel processing is the key to speedup performance and to achieve high throughput in processing large scale data analytical workloads. However, failures of nodes involved in the analytical query can interrupt the whole process, resulting in the complete restart of the query if the system does not have query fault-tolerance. Complete restart might be too costly for processing query on very large databases and might not be able to meet the time constraints in decision support systems. In this paper, the authors present an approach to resume query processing after failure by keeping track of the point at which data has been processed by an operator, called operator tracking.