From Serial Loops to Parallel Execution on Distributed Systems
Programmability and performance portability have been, and continue to be, two major challenges in scientific computing. So far, auto-parallelization has provided sub-optimal solutions, as auto-generated code tends to under-perform and is commonly limited to shared memory environments. In this paper, the authors build upon an existing run-time system designed to efficiently schedule and execute fine-grain task-based applications on heterogeneous, distributed memory environments. They present an automatic compiler tool for analyzing the data flow of serial codes with imperfectly nested, affine loop nests containing if statements. This tool functions as the front-end, source code compiler of the run-time system by automatically converting input serial codes into the run-time's internal representation of the task system that represents the input code.