Architectural Support for the Orchestration of Fine-Grained Multiprocessing for Portable Streaming Applications
Handheld devices are expected to start using fine-grained ASIC accelerators to meet energy-efficiency requirements of increasingly complex applications, e.g., video decoding and reconfigurable radio. To avoid overhead, static multiprocessor schedules are preferable for orchestrating fine-grained accelerators. However, as modern applications use accelerators in irregular patterns, static scheduling leads to low hardware utilization. Run-time scheduling for fine-grained accelerators solves the utilization problem, but easily produces significant overhead. The authors propose an efficient Accelerator Management Unit (AMU), implemented in hardware. E.g., in video decoding, the AMU takes 3 to 18 cycles to compute a macroblock decoding schedule.