id="info"

Failure Analysis of Distributed Scientific Workflows Executing in the Cloud

This paper presents models characterizing failures observed during the execution of large scientific applications on Amazon EC2. Scientific workflows are used as the underlying abstraction for application representations. As scientific workflows scale to hundreds of thousands of distinct tasks, failures due to software and hardware faults become increasingly common. The authors study job failure models for data collected from 4 scientific applications, by their Stampede framework. In particular, they show that a Naive Bayes classifier can accurately predict the failure probability of jobs.

Provided by: IFIP Topic: Cloud Date Added: Aug 2012 Format: PDF

Find By Topic