Adaptive Cluster Distance Bounding for High-Dimensional Indexing
The authors consider approaches for similarity search in correlated, high-dimensional data sets, which are derived within a clustering framework. They note that indexing by "Vector Approximation" (VA-File), which was proposed as a technique to combat the "Curse of Dimensionality," employs scalar quantization, and hence necessarily ignores dependencies across dimensions, which represents a source of sub-optimality. Clustering, on the other hand, exploits inter-dimensional correlations and is thus a more compact representation of the data set. However, existing methods to prune irrelevant clusters are based on bounding hyper-spheres and/or bounding rectangles, whose lack of tightness compromises their efficiency in exact nearest neighbor search. They propose a new cluster-adaptive distance bound based on separating hyper-plane boundaries of Voronoi clusters to complement their cluster based index.