Big Data

Open User Involvement in Data Cleaning for Data Warehouse Quality

Date Added: Dec 2011
Format: PDF

High quality of data warehouse is a key to make smart strategic decisions. The data cleaning is program that performs to deal with the quality problems of data extracted from operational sources before their loading into data warehouse. As the data cleaning can introduce errors and some data require manually clean, there is a need for an open user involvement in data cleaning for data warehouse quality. This is essential to validate the cleaned data by users and to replace the dirty data in their original sources, and also to correct the poor data that can't be cleaned automatically. In this paper, the authors extend the data cleaning and Extract-Transform-Load (ETL) processes to better support the user involvement in data quality management.