CUDA and OpenCL Implementations of 3D Fast Wavelet Transform
In this paper, the authors present several implementations of the 3D Fast Wavelet Transform (3D-FWT) on CUDA and OpenCL running on a new Fermi Tesla architecture. They evaluate these proposals and make a comparison with others optimal executed on multicores CPU and Nvidia Tesla C870. Speedups of the CUDA version on Fermi architecture are the best results, improving the execution times on CPU, ranging from 5.3x to 7.4x for different image sizes, and up to 81 times faster when communications are neglected. Meanwhile, OpenCL obtains solid gains which range from 2x factors on small frame sizes to 3x factors on larger ones.