New Jersey Institute of Technology
Vector coProcessor (VP) resources are often underutilized due to the lack of sustained DLP (Data-Level Parallelism) or the presence of vector-length variations in application code. The authors' work is motivated by: the omnipresence of vector operations in high-performance scientific and embedded applications; the need for performance and energy efficiency; and applications that must often handle various vector sizes. Their design for VP sharing in multicores enhances performance while maintaining low area and energy costs. Their 40nm ASIC design yields 16.66 GFLOPs/Watt.