University of Maryland
Transient network stalls that degrade application performance are frustrating to users and developers alike. Software bugs, network congestion, and intermittent connectivity all have the same symptoms - low throughput, high latency, and user-level timeouts. In this paper, the authors show how an end host can identify the sources of network stalls using only simple counters from its local network stack. By viewing the network stack as a producer-consumer dependency graph and monitoring its activity as a whole, their rule-based expert system correctly identifies which modules are hampering performance over 99% of the time, with false positive rates under 3%.