A Performance Study for Iterative Stencil Loops on GPUs With Ghost Zone Optimizations

Iterative Stencil Loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture, there are usually halo regions that need to be updated and exchanged among different Processing Elements (PEs). In addition, synchronization is often used to signal the completion of halo exchanges. Both communication and synchronization may incur significant overhead on parallel architectures with shared memory. This is especially true in the case of Graphics Processors (GPUs), which do not preserve the state of the per-core L1 storage across global synchronizations.

Provided by: University of Virginia Topic: Data Centers Date Added: Jun 2010 Format: PDF

Download Now

Find By Topic