An Optimized Matrix Multiplication on ARMv7 Architecture

Provided by: AICIT
Topic: Data Centers
Format: PDF
A sufficiently optimized matrix multiplication on embedded systems can facilitate data processing in high performance mobile measuring equipment since plenty of the kernel mathematical algorithms are based on matrix multiplication. In this paper, the authors propose a matrix multiplication specially optimized for ARMv7 architecture. The performance-critical differences between ARMv7 and conventional desktop/server architecture are considered to block the simple implementation. The Advanced-SIMD (Single Instruction Multiple Data) engine NEON is additionally exploited to increase the arithmetic computing performance and decrease the memory access latency.

Find By Topic