Association for Computing Machinery
Histogram construction is a fundamental problem in data management, and a good histogram supports numerous mining operations. Recent work has extended histograms to probabilistic data. However, constructing histograms for probabilistic data can be extremely expensive, and existing studies suffer from limited scalability. This paper designs novel approximation methods to construct scalable histograms on probabilistic data. The authors show that their methods provide constant approximations compared to the optimal histograms produced by the state-of-the-art in the worst case.