Big Data

Computational Statistics and Data Analysis

Download Now Date Added: Jan 2010
Format: PDF

The authors describe a fast, data-driven bandwidth selection procedure for Kernel Conditional Density Estimation (KCDE). Specifically, they give a Monte Carlo dual-tree algorithm for efficient, error-controlled approximation of a cross-validated likelihood objective. While exact evaluation of this objective has an unscalable O.n2/ computational cost, the method is practical and shows speedup factors as high as 286,000 when applied to real multivariate datasets containing up to one million points. In absolute terms, computation times are reduced from months to minutes. This enables applications at much greater scale than previously possible.