Accelerating Double Precision Floating-point Hessenberg Reduction on FPGA and Multicore Architectures
Double precision floating-point performance is critical for hardware acceleration technologies to be adopted by domain scientists. In this paper, the authors use the Hessenberg reduction to demonstrate the potential of FPGAs and GPUs for obtaining satisfactory double precision floating-point performance. Currently a Xeon (Nehalem) 2.26 GHz CPU can outperform Xilinx Virtex4LX200 by 3.6 folds. However, given higher frequency, more hardware resources and local memory banks, FPGAs have the potential to outperform multicore CPUs in the near future. On the GPU side, a GTX 480 (Fermi) achieves 19.4 speedup against the Xeon CPU. Based on the current trend, GPUs will keep widening the advantages against both FPGAs and CPUs on double precision floating-point performance.