Association for Computing Machinery
The power target for exa-scale supercomputing is 20MW, with about 30% budgeted for the memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the large number of memory chips (>10M) required will result in crippling failure rates. Although specialized DRAM memories have been reorganized to reduce power through 3D-stacking or row buffer resizing, their implications on fault tolerance have not been considered. The authors show that addressing reliability and energy is a co-optimization problem involving tradeoffs between error correction cost, access energy and refresh power reducing the physical page size to decrease access energy increases the energy/area over-head of error resilience.