Data Set Preprocessing and Transformation in a Database System
In general, there is a significant amount of data mining analysis performed outside a database system, which creates many data management issues. This paper presents a summary of their experience and recommendations to compute data set preprocessing and transformation inside a database system (i.e., data cleaning, record selection, summarization, de-normalization, variable creation and coding), which is the most time-consuming task in data mining projects. This aspect is largely ignored in the literature. The authors present practical issues, common solutions and lessons learned when preparing and transforming data sets with the SQL language, based on experience from real-life projects.