Reining in the Outliers in Map-Reduce Clusters using Mantri

Provided by: Microsoft
Topic: Big Data
Format: PDF
Experience from an operational MapReduce cluster reveals that outliers significantly prolong job completion. The causes for outliers include run-time contention for processor, memory and other resources, disk failures, varying bandwidth and congestion along network paths and, imbalance in task workload. The authors present Mantri, a system that monitors tasks and culls outliers using cause and resource-aware techniques. Mantri's strategies include restarting outliers, network-aware placement of tasks and protecting outputs of valuable tasks. Using real-time progress reports, Mantri detects and acts on outliers early in their lifetime.

Find By Topic