Data Mining: Understanding Data and Disease Modeling
Analyzing large data sets requires proper understanding of the data in advance. This would help domain experts to influence the data mining process and to properly evaluate the results of a data mining application. This paper introduces an algorithm to identify anomalies in the data. The paper also proposes an approach to include the results of data characteristics checking in a data mining application. The application, reported in this paper, involves developing a disease model from gene expression data using machine learning techniques. The paper demonstrates how: simple models can be generated from a large set of attributes and the structure of the models change, when potentially anomalous cases are removed.