University College Cork
Parallel Matrix-Matrix Multiplication (MMM) is a fundamental part of the linear algebra libraries used by scientific applications on high performance computers. As heterogeneous systems have emerged as high performance computing platforms, the traditional homogeneous algorithms have been adapted to these heterogeneous environments. Although heterogeneous systems have been in use for some time, it remains an open problem of how to optimally partition data on heterogeneous processors to minimize computation, communication, and execution time. While the question of how to subdivide these MMM problems among heterogeneous processors has been studied, the underlying assumption of this prior study is that the data partition shape, the layout of the data within the matrix assigned to each processor, should be rectangular.