Toward High Throughput Algorithms on Many Core Architectures
Advanced many-core CPU chips already have few hundreds of processing cores (e.g. 160 cores in an IBM Cyclops-64 chip) and more and more processing cores become available as computer architecture progresses. The underlying run-time systems of such architectures need to efficiently serve hundreds of processors at the same time, requiring all basic data structures within the run-time to maintain unprecedented throughput. In this paper, the authors analyze the throughput requirements that must be met by algorithms in run-time systems to be able to handle hundreds of simultaneous operations in real time.