North Carolina State University
Data-parallel languages feature fine-grained parallel primitives that can be supported by compilers targeting modern many-core architectures where data parallelism must be exploited to fully utilize the hardware. Previous research has focused on converting data-parallel languages for SIMD (Single Instruction Multiple Data) architectures. However, directly applying them to today's SIMT (Single Instruction Multiple Thread) architectures does not guarantee competitive performance. The authors propose cuNesl, a compiler framework to translate and optimize NESL into parallel CUDA programs for SIMT architectures. By converting recursive calls into while loops, they ensure that the hierarchical execution model in GPUs can be exploited on the \"Flattened\" code.