Data Management

Fast Query for Large Treebanks

Date Added: May 2010
Format: PDF

A variety of query systems have been developed for interrogating parsed corpora, or treebanks. With the arrival of efficient, wide coverage parsers, it is feasible to create very large databases of trees. However, existing approaches that use in-memory search, or relational or XML database technologies, do not scale up. The authors describe a method for storage, indexing, and query of treebanks that uses an information retrieval engine. Several experiments with a large treebank demonstrate excellent scaling characteristics for a wide range of query types. This work facilitates the curation of much larger treebanks, and enables them to be used effectively in a variety of scientific and engineering tasks.