Data Management

Making Aggregation Work in Uncertain and Probabilistic Databases

Date Added: Mar 2010
Format: PDF

The authors describe how aggregation is handled in the Trio system for uncertain and probabilistic data. Because "Exact" aggregation in uncertain databases can produce exponentially-sized results, they provide three alternatives: a low bound on the aggregate value, a high bound on the value, and the expected value. These variants return a single result instead of a set of possible results, and they are generally efficient to compute for both full-table and grouped aggregation queries. They provide formal definitions and semantics and a description of the open-source implementation for single-table aggregation queries. They study the performance and scalability of the algorithms through experiments over a large synthetic data set. They also provide some preliminary results on aggregations over joins.