An NoC Traffic Compiler for Efficient FPGA Implementation of Sparse Graph-Oriented Workloads
Parallel graph-oriented applications expressed in the Bulk-Synchronous Parallel (BSP) and Token Dataflow compute models generate highly-structured communication workloads from messages propagating along graph edges. The authors can statically expose this structure to traffic compilers and optimization tools to reshape and reduce traffic for higher performance (or lower area, lower energy, lower cost). Such offline traffic optimization eliminates the need for complex, run-time NoC hardware and enables lightweight, scalable NoCs. They perform load balancing, placement, fan-out routing, and fine-grained synchronization to optimize their workloads for large networks up to 2025 parallel elements for BSP model and 25 parallel elements for Token Dataflow.