Exploiting Fine-Grained Data Parallelism With Chip Multiprocessors and Fast Barriers

Source: University of California

Favorite

Free registration required

The authors examine the ability of CMPs, due to their lower on-chip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high frequency of barriers, and they explore the impact of different barrier mechanisms upon the efficiency of this approach. To further exploit the potential of CMPs for fine-grained data parallel tasks, they present barrier filters, a mechanism for fast barrier synchronization on chip multi-processors to enable vector computations to be efficiently distributed across the cores of a CMP.
Format:PDF Size:1140.90
Date:Oct 2006