Understanding the Effects and Implications of Compute Node Related Failures in Hadoop

Hadoop has become a critical component in today's cloud environment. Ensuring good performance for Hadoop is paramount for the wide-range of applications built on top of it. In this paper, the authors analyze Hadoop's behavior under failures involving compute nodes. They find that even a single failure can result in inflated, variable and unpredictable job running times, all undesirable properties in a distributed system. They systematically track the causes underlying this distressing behavior. First, they find that Hadoop makes unrealistic assumptions about task progress rates. These assumptions can be easily invalidated by the cloud environment and, more surprisingly, by Hadoop's own design decisions.

Provided by: Association for Computing Machinery Topic: Mobility Date Added: Apr 2012 Format: PDF

Find By Topic