Data Management

A New Method for Indexing Genomes Using On-Disk Suffix Trees

Free registration required

Executive Summary

The authors propose a new method to build persistent suffix trees for indexing the genomic data. Their algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the input string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, their method scales to larger genomic data than managed before.

  • Format: PDF
  • Size: 493.3 KB