Explanation vs Performance in Data Mining: A Case Study with Predicting Runaway Projects
Often, the explanatory power of a learned model must be traded off against model performance. In the case of predicting runaway software projects, the authors show that the twin goals of high performance and good explanatory power are achievable after applying a variety of data mining techniques (discrimination, feature subset selection, rule covering algorithms). This result is a new high water mark in predicting runaway projects. Measured in terms of precision, this new model is as good as can be expected for their data. Other methods might out-perform their result (e.g. by generating a smaller, more explainable model) but no other method could out-perform the precision of their learned model.