Scalable Histograms on Large Probabilistic Data

Provided by: Association for Computing Machinery
Topic: Big Data
Format: PDF
Histogram construction is a fundamental problem in data management, and a good histogram supports numerous mining operations. Recent work has extended histograms to probabilistic data. However, constructing histograms for probabilistic data can be extremely expensive, and existing studies suffer from limited scalability. This paper designs novel approximation methods to construct scalable histograms on probabilistic data. The authors show that their methods provide constant approximations compared to the optimal histograms produced by the state-of-the-art in the worst case.

Find By Topic