Download now Free registration required
Random samples are common in data streams applications due to limitations in data sources and transmission lines, or to load-shedding policies. Here the authors introduce a formal error model and show that, besides providing accurate estimates, it improves query answer accuracy by exploiting past statistics. The method is general, robust in the presence of concept drift, and minimises uncertainties due to sampling with negligible time and space overhead. They describe the application of the method, and the results obtained for SQL window aggregates, statistical aggregates such as quantiles, and data mining functions such as k-means clustering and naive Bayesian classifiers.
- Format: PDF
- Size: 295.1 KB