Association for Computing Machinery
To access a set-associative L1 cache in a high-performance processor, all ways of the selected set are searched and fetched in parallel using physical address bits. Such a cache is oblivious of memory references' software semantics such as stack-heap bifurcation of the memory space, and user-kernel ring levels. This constitutes a waste of energy since e.g., a user-mode instruction fetch will never hit a cache block that contains kernel code. Similarly, a stack access will not hit a cacheline that contains heap data.