New Jersey Institute of Technology
The utilization wall, caused by the breakdown of threshold voltage scaling, hinders performance gains for new generation microprocessors. The authors propose an instruction fusion technique for multiscalar and many-core processors to alleviate its impact. With instruction fusion, similar copies of an instruction to be run on multiple pipelines or cores are merged into a single copy for simultaneous execution. Instruction fusion applied to vector code enables the processor to idle early pipeline stages and instruction caches at various times during program implementation with minimum performance degradation, while reducing program size and the required instruction memory bandwidth.