Association for Computing Machinery
Kernel density estimates are important for a broad variety of applications. Their construction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. The authors propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude more efficient than previous algorithms. Their algorithms do not require knowledge of the kernel or its band-width parameter and are easily parallelizable. They demonstrate how to implement their ideas in a centralized setting and in MapReduce, although their algorithms are applicable to any large-scale data processing framework.