Performance Evaluation of a Simplified Matrix Processor
Data parallel applications are growing in importance and demanding increased performance from hardware. Since, the fundamental data structures for a wide variety of data-parallel applications are scalar, vector, and matrix, this paper describes the authors' proposed Simple Matrix Processor (SMP) for executing scalar/vector/matrix instructions and evaluates its performance. SMP extends a scalar ISA with vector and matrix instruction sets to effectively process a mixture of scalar/vector/matrix instructions needed for data parallel applications on the same hardware. Scalar/vector/matrix instructions are fetched from instruction cache, decoded, and executed on the same execution datapath.