Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors
The 2-D Discrete Wavelet Transform (DWT) consumes up to 68% of the JPEG2000 encoding time. In this paper, the authors develop efficient implementations of this important kernel on General-Purpose Processors (GPPs), in particular the Pentium 4 (P4). Efficient implementations of the 2-D DWT on the P4 must address three issues. First, the P4 suffers from a problem known as 64K aliasing, which can degrade performance by an order of magnitude. They propose two techniques to avoid 64K aliasing which improve performance by a factor of up to 4.20 second; a straightforward implementation of vertical filtering incurs many cache misses.