Date Added: Sep 2011
The increase in scale and complexity of large compute clusters motivates a need for representative workload bench-marks to evaluate the performance impact of system changes, so as to assist in designing better scheduling algorithms and in carrying out management activities. To achieve this goal, it is necessary to construct workload characterizations from which realistic performance benchmarks can be created. In this paper, the authors focus on characterizing run-time task resource usage for CPU, memory and disk. The goal is to find an accurate characterization that can faithfully reproduce the performance of historical workload traces in terms of key performance metrics, such as task wait time and machine resource utilization.