Why Let Resources Idle? Aggressive Cloning of Jobs With Dolly
Despite prior research on outlier mitigation, the authors analysis of jobs from the Facebook cluster shows that outliers still occur, especially in small jobs. Small jobs are particularly sensitive to long-running outlier tasks because of their interactive nature. Outlier mitigation strategies rely on comparing different tasks of the same job and launching speculative copies for the slower tasks. However, small jobs execute all their tasks simultaneously, thereby not providing sufficient time to observe and compare tasks. Building on the observation that clusters are underutilized, they take speculation to its logical extreme - run full clones of jobs to mitigate the effect of outliers.