University of California, Los Angeles (Anderson)
Hardware supported multithreading can mask memory latency by switching the execution to ready threads, which is particularly effective on irregular applications. FPGAs provide an opportunity to have multithreaded data paths customized to each individual application. In this paper, the authors describe the compiler generation of these hardware structures from a C subset targeting a Convey HC-2ex machine. They describe how this compilation approach differs from other C to HDL compilers. They use the compiler to generate a multithreaded sparse matrix vector multiplication kernel and compare its performance to existing FPGA, and highly optimized software implementations.