High Performance Hardware Operators for Data Level Parallelism Exploration

Many microprocessor vendors have incorporated high performance operators in a Single Instruction Multiple Data (SIMD) fashion into their processors to meet the high performance demand of increasing multimedia workloads. This paper presents some recent works on hardware implementation of these operators for Data-Level Parallelism (DLP) exploration. Two general architectural techniques for designing operators with SIMD support are first described including low precision based scheme and high precision based scheme. Then new designs for integer operators as well as floating-point operators are provided to accommodate best tradeoff between cost and performance.