Association for Computing Machinery
Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques. The authors present a fully automated C-to-FPGA framework to address this problem. Their framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, their proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching.