Assessing Data Mining Results Via Swap Randomization
The problem of assessing the significance of data mining results on high-dimensional 0 - 1 datasets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by standard statistical tests such as chi-square, or other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are difficult to apply to sets of patterns or other complex results of data mining algorithms. This paper considers a simple randomization technique that deals with this shortcoming.