Date Added: Jun 2009
This paper presents a helper thread pre-fetching scheme that is designed to work on loosely coupled processors, such as in a standard Chip Multi-Processor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resources are not contended by the application and helper threads, hence preserving the speed of the application. However, inter-processor communication is expensive in such a system. The authors present techniques to alleviate this. Their approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread.