An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware

Download Now
Provided by: University of Tehran
Topic: Hardware
Format: PDF
The enormous gap between the high-performance capabilities of today's CPUs and off-chip communication poses extreme challenges to the development of numerical software that is scalable and achieves high performance. In this paper, the authors describe a successful methodology to address these challenges - starting with their algorithm design, through kernel optimization and tuning, and finishing with their programming model. All these lead to development of a scalable high-performance Singular Value Decomposition (SVD) solver. They developed a set of highly optimized kernels and combined them with advanced optimization techniques that feature fine-grain and cache-contained kernels, a task based approach, and hybrid execution and scheduling runtime, all of which significantly increase the performance of their SVD solver.
Download Now

Find By Topic