The International Journal of Innovative Research in Computer and Communication Engineering
An enormous amount of data stored in databases and data warehouses, it is increasingly important to develop powerful tools for analysis of such data and mining interesting knowledge from it. Data mining is a process of inferring knowledge from such huge data. Dirty data is a serious problem leading to incorrect decision making, inefficient daily operations and eventually wasting both time and money. Data quality refers to the accuracy and completeness of the data. To perk up data quality, it is now and then necessary to dirt free the data, which can involve the removal of duplicate records, normalizing the values used to represent information in the database and accounting for missing data points.