Date Added: Apr 2010
Sharing data between the processors becomes increasingly expensive as the number of cores in a system grows. In particular, the network processing overhead on larger systems can reach tens of thousands of CPU cycles per TCP packet, for just hundreds of "Useful" instructions. Most of these cycles are spent waiting - when the CPU is stalled while accessing "Bouncing" cache lines of network control data shared by all processors in the system - and synchronizing access to this shared state. In many cases, the resulting excessive CPU utilization limits the overall system performance.