The University of Tulsa
Traditional loop transformations improve program performance by increasing memory usage efficiency. These tactics are limited by the existence of loop carried dependencies. Data context switching is an additional transformation technique which improves performance in high-performance configurable processors when loop carried dependencies are present. Since data-context switching requires larger local memories, higher memory bandwidth and a very large register le, this tactic has not been useful for the inherently energy constrained embedded system domain. This paper demonstrates that the problems imposed by data context switching can be solved by supporting additional architectural mechanisms which support cluster pipelining, modulo scheduling, addressing modes for register les, and independent address generation units.