Fine-Grain OpenMP Runtime Support with Explicit Communication Hardware Primitives

Provided by: edaa
Topic: Hardware
Format: PDF
The authors present a runtime system that uses the explicit on-chip communication mechanisms of the SARC multi-core architecture, to implement efficiently the OpenMP programming model and enable the exploitation of fine-grain parallelism in OpenMP programs. They explore the design space of implementation of OpenMP directives and runtime intrinsics, using a family of hardware primitives; remote stores, remote DMAs, hardware counters and hardware event queues with automatic responses, to support static and dynamic scheduling and data transfers in local memories. Using an FPGA prototype with four cores, they achieve OpenMP task creation latencies of 30-35 processor clock cycles, initiation of parallel contexts in 50 cycles and synchronization primitives in 65-210 cycles.

Find By Topic