Lightweight, High-Resolution Monitoring for Troubleshooting Production Systems
Source: Princeton University
Production systems are commonly plagued by intermittent problems that are difficult to diagnose. This paper describes a new diagnostic tool, called Chopstix, that continuously collects profiles of low-level OS events (e.g., Scheduling, L2 cache misses, CPU utilization, I/O operations, page allocation, locking) at the granularity of executables, procedures and instructions. Chopstix then reconstructs these events offline for analysis. The authors have used Chopstix to diagnose several elusive problems in a large scale production system, thereby reducing these intermittent problems to reproducible bugs that can be debugged using standard techniques.