Institute of Electrical & Electronic Engineers
The newest GPU Kepler architecture offers a reconfigurable L1 cache per streaming multi-processor with different cache size and cache associativity. Both these cache parameters affect the overall performance of cache intensive algorithms, i.e. the algorithms which intensively reuse the data. In this paper, the authors analyze the impact of different configurations of L1 cache on execution of matrix multiplication algorithm for different problem sizes. The basis of their research is the existing theoretical analysis of performance drawbacks which appear for matrix multiplication while executed on multi-core CPU.