University of Teramo
Increasing on-chip wire delay along with the distributed nature of processing elements makes instruction scheduling for tiled data flow architectures very crucial. The authors' analysis reveals that careful placement of most frequently executed sections of applications, and directly addressing resource contention can significantly improve the performance of the application. The former reduces the operand network latency, while the latter reduces stalls due to contention for processing elements. They augment one of the most recent instructions scheduling algorithms |hierarchical instruction scheduling to treat loops as a first class entity in placement decisions.