Provided by: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Topic: Data Management
One of the basic tasks in data mining activity is data preprocessing and preparing dataset. Efficient data analysis can be made easier with datasets having columns in horizontal tabular layout. This paper presents an overview of data preprocessing and dataset preparation techniques using SQL. To prepare dataset if the authors use SQL aggregations they return one column per aggregated group. This is the limitation of SQL aggregation. In this paper, they have proposed need of effective and optimized usage of SQL to build dataset using horizontal aggregations. Also, if the result of horizontal aggregation i.e. horizontal layout is integrated with K-means clustering algorithm they can get proper clusters.