Data Management

Making Aggregation Work in Uncertain and Probabilistic Databases

Executive Summary

The authors describe how aggregation is handled in the Trio system for uncertain and probabilistic data. Because "Exact" aggregation in uncertain databases can produce exponentially-sized results, they provide three alternatives: a low bound on the aggregate value, a high bound on the value, and the expected value. These variants return a single result instead of a set of possible results, and they are generally efficient to compute for both full-table and grouped aggregation queries. They provide formal definitions and semantics and a description of the open-source implementation for single-table aggregation queries. They study the performance and scalability of the algorithms through experiments over a large synthetic data set. They also provide some preliminary results on aggregations over joins.

