Issues in Applying Data Mining to Grid Job Failure Detection and Diagnosis
As grid computation systems become larger and more complex, manually diagnosing failures in jobs becomes impractical. Recently, machine-learning techniques have been proposed to detect a variety of application failures in grids. While this is a promising approach, there are many options as to how to apply machine learning to this problem, and it not always obvious which approaches are feasible or effective. This paper explores some issues that arise when one tries to apply existing implementations of data mining algorithms to diagnose as well as predict job failures in grids.