Modular Data Compression to Optimally Locate Regular Segments in Sequences: Application to DNA Sequence Analysis

Free registration required

Executive Summary

A new location method for regular segments in sequences is presented. It uses the Minimum Description Length (MDL) criterion. If a lossless compressor achieves size reduction by exploiting a regularity, the algorithm TurboOptLift locates very quickly the segments where the regularity is probably present and those where it is not. The location is optimal from a MDL viewpoint. The paper applies the method to the problem of locating approximate tandem repeats in DNA sequences.

  • Format: PDF
  • Size: 100.9 KB