Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs

Provided by: Institute of Electrical & Electronic Engineers
Topic: Hardware
Format: PDF
Trading communication with redundant computation can increase the silicon efficiency of FPGAs and GPU in accelerating communication-bound sparse iterative solvers. While k iterations of the iterative solver can be unrolled to provide O (k) reduction in communication cost, the extent of this unrolling depends on the underlying architecture, its memory model and the growth in redundant computation. This paper presents a systematic procedure to select this algorithmic parameter k, which provides communication-computation trade-off on hardware accelerators like FPGA and GPU.

Find By Topic