Hadoop's Overload Tolerant Design Exacerbates Failure Detection and Recovery

Data processing frameworks like Hadoop need to efficiently address failures, which are common occurrences in today's large-scale data center environments. Failures have a detrimental effect on the interactions between the framework's processes. Unfortunately, certain adverse but temporary conditions such as network or machine overload can have a similar effect. Treating this effect oblivious to the real underlying cause can lead to sluggish response to failures. The authors show that this is the case with Hadoop, which couples failure detection and recovery with overload handling into a conservative design with conservative parameter choices.

Provided by: Association for Computing Machinery Topic: Mobility Date Added: Jun 2011 Format: PDF

Find By Topic