How the Meltdown and Spectre chip flaws will impact cloud computing

Mitigations for two critical architectural flaws in CPUs can cause performance degradation, but real-world impact is lower than synthetic benchmarks.

In the wake of the Meltdown and Spectre architectural flaws, cloud firms are scrambling to apply patches to these vulnerabilities. Both AMD and Intel processors are affected by the pair, but only Intel processors are vulnerable to all the attack variants. While these architectural flaws are possible to partially mitigate using software patches, the core issue—an oversight in design that requires revised hardware—still remains.

SEE: Information security incident reporting policy (Tech Pro Research)

The first patch, Kernel Page Table Isolation (KPTI), has been the subject of much speculation as early reports estimated a performance regression of 30%. As it is, real-world impact has been lower than that figure bandied about thus far. Naturally, all performance is workload-dependent. While synthetic tests can be used as a good indicator of the speed of certain operations, they often exaggerate highs and lows compared to real world workloads.

Demystifing claims of KPTI slowdown

KPTI corrects part of the vulnerability by separating user-space and kernel-space page tables. This necessarily introduces a performance penalty, as system calls or interrupts have context switching overheads. Because of this, workloads that extensively rely on those will be impacted the most after patching. However, the introduction of process-context identifiers (PCIDs) reduces that overhead, as this feature prevents processes from looking at data not associated with the active process in the translation lookaside buffer (TLB). With this added protection, the cycle of flushing and repopulating the TLB can be avoided.

Hardware support for PCIDs was introduced with the Westmere generation of processors, though support for the feature was only enabled in version 4.14 of the Linux kernel. While KPTI has been backported to the 4.4 and 4.9 kernels, support for PCIDs has not been. Comparing results between the three kernels with KPTI enabled and disabled is perhaps the best current available indicator of performance regressions between the two. By means of Open Benchmarking, a few test benchmarks have been performed.

The most interesting comparison to make here is the performance of PostgreSQL, as developer Andres Freund posted benchmarks on January 2nd indicating regressions of 7-17%, and 16-23% without PCID. While the systems tested, and exact configuration of pgbench differ slightly, the Open Benchmarking test bears out the 23% figure in the worst case scenario. However, three different tests between the 4.4 kernel with KPTI disabled and PCID support not present with the 4.14 kernel with KPTI and PCID shows a performance increase of 9.9% for normal load, 6.0% for single threaded, and 17.95% for heavy contention. While the benchmarks absolutely indicate a performance regression for lateral kernel upgrades, upgrading to the newest kernel eliminates performance penalties in this situation.

This effect is not limited to PostgreSQL. Similar comparisons for Redis show a 30% performance increase for GET and 33.5% performance increase for SET benchmarks.

SEE: Major Linux redesign in the works to deal with Intel security flaw (ZDNet)

However, not all use cases see a performance increase when moving from an insecure 4.4 to secure 4.14 kernel. Rendering in Blender, as well as compiling Apache or the Linux kernel show only marginal differences, while static page serving in Apache is roughly 20% slower, though the application of KPTI only shows a 5% performance penalty—as such, newer kernels seemingly have a performance regression for other reasons. With PostgreSQL, Linux 4.14 overall is the fastest in this database benchmark, and this is one of the I/O workloads where Kernel Page Table Isolation does cause a performance increase. But the relative performance for the three tested kernels with KPTI on/off were about the same with no significantly different performance out of the older kernels lacking PCID optimizations.

As an overall view, Linus Torvalds suggested that performance penalties should be around 5%.

New measures against Meltdown and Spectre

To combat the chip flaws, Google has announced its homegrown solution Reptoline, which requires recompiling the operating system and applications and may execute untrusted code. This fix, along with a microcode update from Intel that introduces indirect branch restricted speculation (IBRS) for Skylake and newer series processors, can defend against both variants of Spectre. Reptoline has already been deployed on Google Cloud Platform.

Securing the future of the cloud computing industry

Given the increasing number of VM escape attacks, like the high-profile VENOM attack of 2015, as well as vulnerabilities discovered last year in Hyper-V and VMware products, security measures need to be increased to maintain trust in cloud computing. There are still measures that can be taken. According to Rob Szumski, the Tectonic product manager at CoreOS:

Hardening security can be done in multiple ways — through containers, updates, and more. Containers on top of VMs are already improving the status quo, where different software packages are run without any barriers between them. Containers greatly increase the difficulty of a vulnerability in software A from attacking software B or the host itself.

Customer machines will always need to be continually updated as more techniques are found that exploit the flaws, such as the ones found this week, or any future bugs in the modern computing stack. Patching software is the fundamental solution to staying secure - software bugs are never going away. Cloud provider hypervisors are getting increasingly customized and stripped down to suit each provider's needs and reduce the attack surface area.

What's your view?

Have you discovered any measurable performance degredation after patching for Meltdown and Spectre? Are you moving to newer kernel versions to mitigate potential performance problems? Share your experiences in the comments.

Also see

cloud computing network illustration
Image: iStockphoto/GoMixer

About James Sanders

James Sanders is a Java programmer specializing in software as a service and thin client design, and virtualizing legacy programs for modern hardware.

Editor's Picks

Free Newsletters, In your Inbox