University of Texas at Arlington
Chip Multiprocessors are becoming common as the cost of increasing chip power begins to limit single core performance. The most power efficient CMP consists of low power in-order cores. However, performance on such a processor is low unless the workload is nearly completely parallelized, which depending on the workload can be impossible or require significant programmer effort. This paper argues that the programmer effort required to parallelize an application can be reduced if the underlying architecture promises faster execution of the serial portion of an application.