Software Pipelined Execution of Stream Programs on GPUs
The authors have described an efficient framework for mapping StreamIt programs to GPUs. Their framework software pipelines the execution of the filters and performs both scheduling and assignment of filters to processors. They also present a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling exploits both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipeline parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, yielding speedups between 1.87X and 36.83X over a single threaded CPU.