University of Calgary
In this paper, the authors present the design and software implementation of a high-performance area-efficient Advanced Encryption Standard (AES) cipher on a many-core platform. A preliminary cipher design is partitioned and mapped to an array of 70 small processors, and offers a throughput of 16.625 clock cycles per byte. The usage of instruction and data memory, and the workload of each processor are characterized for further optimization. Through workload balancing and processor fusion, the throughput of the cipher is increased by 43% to 9.5 clock cycles per byte.