Efficient and Effective Analysis of Data Quality Using Pattern Tableaux
Data Auditor is a system for analyzing data quality via exploring data semantics. Given a user-supplied constraint, such as a functional dependency or an inclusion dependency, the system computes pattern tableaux, which are concise summaries of subsets of the data that satisfy (or fail) the constraint. The engine of Data Auditor is an efficient algorithm for finding these patterns, which defers expensive computation on patterns until needed during search, thereby pruning wasted effort. The authors demonstrate the utility of their approach on a variety of data as well as the performance gain from employing this algorithm.