Fast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors
This paper presents algorithms for fast quantile and frequency estimation in large data streams using Graphics Processor Units (GPUs). The authors exploit the high computational power and memory bandwidth of graphics processors and present a novel sorting algorithm that performs rasterization operations on the GPUs. The authors use sorting as the main computational component for histogram approximation and the construction of approximate quantile and frequency summaries. The overall algorithms for numerical statistics computation on data streams are deterministic, applicable to xed or variable sized sliding windows and use a limited memory footprint.