Data Management

Referential Integrity Quality Metrics

Referential integrity is an essential global constraint in a relational database, that maintains it in a complete and consistent state. In this paper, the authors assume the database may violate referential integrity and relations may be de-normalized. They propose a set of quality metrics, defined at four granularity levels: database, relation, attribute and value, that measure referential completeness and consistency. Quality metrics are efficiently computed with standard SQL queries, that incorporate two query optimizations: left outer joins on foreign keys and early foreign key grouping. Experiments evaluate their proposed metrics and SQL query optimizations on real and synthetic databases, showing they can help detecting and explaining referential errors.