Data Management

On Scaling Up Sensitive Data Auditing

In this paper, the authors study the following problem: given a query and a set of sensitive records, find the subset of records "Accessed" by the query. The notion of a query accessing a single record is adopted from prior work. There are several scenarios where the number of sensitive records is large (in the millions.) The novel challenge addressed in this paper is to develop a general-purpose solution for complex SQL that scales in the number of sensitive records. They propose efficient techniques that improve upon straightforward alternatives by orders of magnitude. Their empirical evaluation over the TPC-H benchmark data illustrates the benefits of their techniques.