Federation: Boosting Per-Thread Performance of Throughput-Oriented Manycore Architectures
Manycore architectures designed for parallel workloads are likely to use simple, highly multi-threaded, in-order cores. This maximizes throughput, but only with enough threads to keep hardware utilized. For applications or phases with more limited parallelism, the authors describe creating an out-of-order processor on the fly, by federating two neighboring in-order cores. They reuse the large register file in the multi-threaded cores to implement some out-of-order structures and re-engineer other large, associative structures into simpler lookup tables. The resulting federated core provides twice the single-thread performance of the underlying in-order core, allowing the architecture to efficiently support a wider range of parallelism.