Efficient and Effective KNN Sequence Search with Approximate n-grams
In this paper, the authors address the problem of finding K-Nearest Neighbors (KNN) in sequence databases using the edit distance. Unlike most existing papers using short and exact n-gram matchings together with a filter-and-refine framework for KNN sequence search, the authors' new approach allows them to use longer but approximate n-gram matchings as a basis of KNN candidates pruning. Based on this new idea, they devise a pipeline framework over a two-level index for searching KNN in the sequence database. By coupling this framework together with several efficient filtering strategies, i.e.