Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation
GPU-to-CPU translation may extend Graphics Processing Units (GPU) programs executions to multi-/many-core CPUs, and hence enable cross-device task migration and promote whole-system synergy. This paper describes some of the authors' findings in treatment to GPU synchronizations during the translation process. They show that careful dependence analysis may allow a ne-grained treatment to synchronizations and reveal redundant computation at the instruction-instance level. Based on thread-level dependence graphs, they present a method to enable such ne-grained treatment automatically. Experiments demonstrate that compared to existing translations, the new approach can yield speedup of a factor of integers.