University College Cork
Communications on hierarchical heterogeneous HPC platforms can be optimized based on topology information. For MPI, as a major programming tool for such platforms, a number of topology-aware implementations of collective operations have been proposed for optimal scheduling of messages. This approach improves communication performance and does not require to modify application source code. However, it is applicable to collective operations only and does not affect the parts of the application that are based on point-to-point exchanges. In this paper, the authors address the problem of efficient execution of data-parallel applications on interconnected clusters and present a topology-aware optimization that improves data partition by taking into account the entire communication flow of the application.