Networking

Lightweight, High-Resolution Monitoring for Troubleshooting Production Systems

Download Now Free registration required

Executive Summary

Production systems are commonly plagued by intermittent problems that are difficult to diagnose. This paper describes a new diagnostic tool, called Chopstix, that continuously collects profiles of low-level OS events (e.g., Scheduling, L2 cache misses, CPU utilization, I/O operations, page allocation, locking) at the granularity of executables, procedures and instructions. Chopstix then reconstructs these events offline for analysis. The authors have used Chopstix to diagnose several elusive problems in a large scale production system, thereby reducing these intermittent problems to reproducible bugs that can be debugged using standard techniques.

  • Format: PDF
  • Size: 198.7 KB