Toward Predictive Failure Management for Distributed Stream Processing Systems
Source: North Carolina State University
Distributed Stream Processing Systems (DSPSs) have many important applications such as sensor data analysis, network security, and business intelligence. Failure management is essential for DSPSs that often require highly-available system operations. In this paper, the authors explore a new predictive failure management approach that employs online failure prediction to achieve more efficient failure management than previous reactive or proactive failure management approaches. They employ light-weight stream-based classification methods to perform online failure forecast. Based on the prediction results, the system can take differentiated failure preventions on abnormal components only. The failure prediction model is tunable, which can achieve a desired tradeoff between failure penalty reduction and prevention cost based on a user-defined reward function.