Fitting FFT Onto the G80 Architecture

There are two sources of motivation for this paper. First is the recent success in running matrix-matrix multiply on G80 GPUs. In this paper, the authors present a novel implementation of FFT on GeForce 8800GTX that achieves 144 G-flop/s that is nearly 3x faster than best rate achieved in the current vendor’s numerical libraries. This performance is achieved by exploiting the Cooley-Tukey framework to make use of the hardware capabilities, such as the massive vector register files and small on-chip local storage. They also consider performance of the FFT on few other platforms.

Subscribe to the Innovation Insider Newsletter

Catch up on the latest tech innovations that are changing the world, including IoT, 5G, the latest about phones, security, smart cities, AI, robotics, and more. Delivered Tuesdays and Fridays

Subscribe to the Innovation Insider Newsletter

Catch up on the latest tech innovations that are changing the world, including IoT, 5G, the latest about phones, security, smart cities, AI, robotics, and more. Delivered Tuesdays and Fridays

Resource Details

Provided by:
UC Regents
Topic:
Hardware
Format:
PDF