INSTRUCT: Space-Efficient Structure for Indexing and Complete Query Management of String Databases
The tremendous expanse of search engines, dictionary and thesaurus storage, and other text mining applications, combined with the popularity of readily available scanning devices and optical character recognition tools, has necessitated efficient storage, retrieval and management of massive text databases for various modern applications. For such applications, the authors propose a novel data structure, INSTRUCT, for efficient storage and management of sequence databases. Their structure uses bit vectors for reusing the storage space for common triplets, and hence, has a very low memory requirement. INSTRUCT efficiently handles prefix and suffix search queries in addition to the exact string search operation by iteratively checking the presence of triplets.