Date Added: Apr 2011
Querying databases with incomplete or inconsistent content remains a broad and difficult problem. In this paper, the authors study how to improve aggregations computed on databases with referential errors in the context of database integration, where each source database has different tables, columns with similar content across multiple databases, but different referential integrity constraints. Thus, a query in an integrated database may involve tables and columns with referential integrity errors. In a data warehouse, even though the ETL processes x referential integrity errors, this is generally done by inserting "Dummy" records into the dimension tables corresponding to such invalid foreign keys, thereby artificially enforcing referential integrity.