On Finite Memory Universal Data Compression and Classification of Individual Sequences
Source: Israel Institute of Technology
Consider the case where consecutive blocks of N letters of a semi-infinite individual sequence X over a finite-alphabet are being compressed into binary sequences by some one-to-one mapping. No a-priori information about X is available at the encoder, which must therefore adopt a universal data-compression algorithm. It is known that if the universal LZ77 data compression algorithm is successively applied to N-blocks then the best error-free compression, for the particular individual sequence X is achieved as N tends to infinity. The best possible compression that may be achieved by any universal data compression algorithm for finite N-blocks is discussed. It is demonstrated that context tree coding essentially achieves it.