Modular Data Compression to Optimally Locate Regular Segments in Sequences: Application to DNA Sequence Analysis

Date Added: Jan 2010
Format: PDF

A new location method for regular segments in sequences is presented. It uses the Minimum Description Length (MDL) criterion. If a lossless compressor achieves size reduction by exploiting a regularity, the algorithm TurboOptLift locates very quickly the segments where the regularity is probably present and those where it is not. The location is optimal from a MDL viewpoint. The paper applies the method to the problem of locating approximate tandem repeats in DNA sequences.