Data Management

Space-Constrained Gram-Based Indexing for Efficient Approximate String Search

Free registration required

Executive Summary

Answering approximate queries on string collections is important in applications such as data cleaning, query relaxation, and spell checking, where inconsistencies and errors exist in user queries as well as data. Many existing algorithms use gram-based inverted-list indexing structures to answer approximate string queries. These indexing structures are "Notoriously" large compared to the size of their original string collection. In this paper, the authors study how to reduce the size of such an indexing structure to a given amount of space, while retaining efficient query processing. They first study how to adopt existing inverted-list compression techniques to solve their problem.

  • Format: PDF
  • Size: 536.02 KB