The University of Tulsa
Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applications, and their significance is growing as network latency increases relative to processor speeds. In this paper, the authors propose two mechanisms that improve shared memory performance by eliminating remote misses and/or reducing the amount of communication required to maintain coherence. They focus on improving the performance of applications that exhibit producer-consumer sharing. They present a simple hardware mechanism for detecting producer-consumer sharing.