Date Added: Aug 2011
The increased number of execution units in many-core processors is driving numerous paradigm changes in parallel systems. Previous techniques that focused solely upon obtaining correct results are being rendered obsolete unless they can also provide results efficiently. This paper dives into the particular problem of efficiently supporting fine-grained task creation and task termination for run-time systems in shared memory processors. The authors' contributions are inspired by their observation of High Performance Computing (HPC) programs, where it is common for a large number of similar fine-grained tasks to become enabled at the same time.