Data Management

A Referential Integrity Browser for Distributed Databases

Executive Summary

The authors demonstrate a program that can inspect a distributed relational database on the Internet to discover and quantify referential integrity issues for integration purposes. The program computes data quality metrics for referential integrity at four granularity levels: database, table, column and value, going from a global to a detailed view, exhibiting specific evidence about referential errors. Two orthogonal data quality dimensions are considered: completeness and consistency. Each table is stored at one primary site and it can be replicated at multiple sites, having foreign key references to tables at the same site or at different sites. The user can choose alternative query evaluation strategies to efficiently compute referential error metrics.

