Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century
The paper explores the consequences of their collective inattention to methodology on innovation, makes recommendations for addressing this problem in one domain, and provides guidelines for other domains. The paper describes benchmark suite design, experimental design, and analysis for evaluating Java applications. For example, the paper introduces new criteria for measuring and selecting diverse applications for a benchmark suite. The paper shows that the complexity and nondeterminism of the Java runtime system make experimental design a first-order consideration, and they recommend mechanisms for addressing complexity and nondeterminism. Drawing on these results, the paper suggests how to adapt methodology more broadly. To continue to deliver innovations, their field needs to significantly increase participation in and funding for developing sound methodological foundations.