Implementing a Blocked Aasen's Algorithm With a Dynamic Scheduler on Multicore Architectures
Factorization of a dense symmetric indefinite matrix is a key computational kernel in many scientific and engineering simulations. However, it is difficult to develop a scalable factorization algorithm that guarantees numerical stability through pivoting and takes advantage of the symmetry at the same time. This is because such an algorithm exhibits many of the fundamental challenges in parallel programming like irregular data accesses and irregular task dependencies. In this paper, the authors address these challenges in a tiled implementation of a blocked left-looking Aasen's algorithm. To exploit parallelism in this left-looking algorithm, they study several performance enhancing techniques; e.g., parallel reduction to update a panel, panel factorization by tall-skinny LU, and parallel symmetric pivoting.