Netflix open source FlameScope CPU tool helps developers debug performance issues

The new visualization tool instantly generates flame graphs from sections of system profiles.

Building a slide deck, pitch, or presentation? Here are the big takeaways:
  • Netflix created the FlameScope tool to solve a latency problem in a microservice. The tool is being released publicly and features were added to make it more widely applicable to other use cases.
  • The tool would be indispensable for programmers and operators aiming to identify the origin of performance issues.

Netflix's cloud performance engineering team has released FlameScope, a performance visualization utility that allows programmers and system administrators to analyze CPU activity by generating a subsecond-offset heat map in which arbitrary spans of time can be selected by the user for further analysis by selecting a portion of the heat map, for which a flame graph is generated for corresponding block of time.

According to a Netflix blog post, this tool was originally developed to solve a particular problem at Netflix. A microservice was experiencing spikes in latency approximately every 15 minutes. After a correlation was found between the latency spikes and an increase in CPU utilization that only lasted a few seconds at a time, further troubleshooting was hampered by the difficulty of generating flame graphs for a problem that occurred in this frequency. A one-minute flame graph was too small to reliably capture the spike in CPU utilization, and flame graphs for longer periods were ineffective as the issue became indistinguishable amidst the normal workload, the post said.

FlameScope, in particular, automates the task of selecting ranges in a CPU profile for visualization in flame graphs. According to the post, this was the impetus for the creation of the tool:

I began by slicing it into ten second ranges, and creating a flame graph for each. This approach looked promising as it revealed variation, so I sliced it even further down to one second windows. Browsing these short windows solved the problem and found the issue, however, it had become a laborious task. I wanted a quicker way.

SEE: Comparison chart: VPN service providers (Tech Pro Research)

As this is the initial release, plans for additional features to be added in the future are underway. The authors are actively soliciting outside contributors implementing features and new ideas to make the utility more general-purpose—the project is written in Python and JavaScript/Node.js.

Presently, FlameScope only handles data from perf on Linux, though support for other profile sources is planned, the post noted. Additional interactive features, such as palette selection and data transformations, as well as the ability to export the resulting flame graph as an SVG are also priorities.

Netflix has a long history of releasing utilities developed internally for debugging and performance analysis as open source software. Netflix's Chaos Engineering concept and Simian Army suite of resiliency testing tools have been widely adopted inside and outside of technology firms. While Google, Amazon, Microsoft, Dropbox, and Yahoo have adopted Chaos principles in their operations, so have the University of California, Sandia National Labs, Fidelity Investments, and O'Reilly Media.

Also see

Image: iStockphoto/FS-Stock

About James Sanders

James Sanders is a Java programmer specializing in software as a service and thin client design, and virtualizing legacy programs for modern hardware.

Editor's Picks

Free Newsletters, In your Inbox