ERACER: A Database Approach for Statistical Inference and Data Cleaning

Real-world databases often contain syntactic and semantic errors, in spite of integrity constraints and other safety measures incorporated into modern DBMSs. The authors present ERACER, an iterative statistical framework for inferring missing information and correcting such errors automatically. The approach is based on belief propagation and relational dependency networks, and includes an efficient approximate inference algorithm that is easily implemented in standard DBMSs using SQL and user defined functions. The system performs the inference and cleansing tasks in an integrated manner, using shrinkage techniques to infer correct values accurately even in the presence of dirty data.

Provided by: Association for Computing Machinery Topic: Data Management Date Added: Jun 2010 Format: PDF

Download Now

Find By Topic