University of Idaho
As cloud computing clusters continue to grow, maintaining the health of these clusters becomes increasingly challenging. It has been studied how the authors can efficiently monitor the status of machines in these clusters and how they can detect problems or predict them before they occur. While some existing research has been done to predict hardware failures in general clusters, as far as they know, no work has focused on failure prediction in Hadoop systems specifically. Leveraging the additional information available for the health of a system through a framework such as Hadoop, they demonstrate the state-of-the art failure prediction models can be outperformed by models tailored to the framework.