University of Teramo
Memory bandwidth limitation is one of the major impediments to high-performance microprocessors. This paper investigates a class of store misses that can be eliminated to reduce data traffic. Those store misses fetch cache blocks whose original data is never used. If fully overwritten by subsequent stores, those blocks can be installed directly in the cache without accessing lower levels of the memory hierarchy, eliminating the corresponding data traffic. The authors results indicate that for a 1MB data cache, 28% of cache misses are avoidable across SPEC CPU INT 2000 benchmarks. They propose a simple hardware mechanism, the Store Fill Buffer (SFB), which directly installs blocks for store misses, and substantially reduces the data traffic.