Fixing the individual bugs behind the vulnerabilities that hackers use to attack systems is important, but it’s much more effective to block the techniques attackers use to exploit those vulnerabilities and remove an entire class of exploits — or at least make them more expensive and time-consuming to create.

Return-oriented programming (ROP) has been a very common technique that’s particularly hard to block, because instead of trying to inject their own code into running processes (something operating systems and browsers have been adding defences against), attackers look for small chunks of the legitimate code that’s already in memory that contain ‘returns’ — where the code jumps forward to a new routine or back to the main thread.

“With ROP, I can’t create new code; I can only jump around to different pieces of code and try to string that together into a payload,” Dave Weston, director of OS security at Microsoft told TechRepublic. If the legitimate code has a memory safety bug like a buffer overflow, corrupting those pointers in memory means the system starts running the attacker’s own code instead of going back to the address in the program’s call stack.

Microsoft has been working on ways to stop attackers hijacking the flow of control in programs like this since around 2012. Windows has added multiple levels of protection, starting with signing important code (Code Integrity Guard, or CIG) and blocking runtime code generation first in the browser and then in VMs and the kernel (Arbitrary Code Guard, or ACG).

“The goal there is to prevent the attacker from loading a binary that Microsoft or one of our third parties didn’t sign; even if they are able to exploit the process and get memory corruption in the process, they can’t inject shellcode or other constructs,” Weston explained.

That defence was effective enough to push attackers to use ROP, so the next step was trying to protect the flow of control within the program. Control flow integrity arrived in Windows 8.1 as Control Flow Guard (CFG) This blocks forward control flow attacks (where the code jumps out or makes a call and attackers try to send it to the wrong place).

“At compile time, we take a record of all the indirect transfers or jumps or calls that the software developer intends the code to make, and that map is passed to the kernel when you load the binary and it’s enforced when the code runs,” Weston said. If an attacker does manage to send the code to an address that isn’t on the map, the process is terminated: an infected app will crash, but it won’t run the malicious code.

CFG is the reason that several key zero-day attacks on Windows 7 didn’t affect Windows 10. But, as Weston noted, 2015 is a long time ago in security terms, and CFG only addresses part of the problem. “Attackers have actually started to corrupt the stack, injecting their ROP frames or their malicious instruction sets.” By interfering with the execution flow when it returns to the main thread, rather than when it jumps forward, they can bypass CFG and execute their own code when the thread should go back.

Call and return

It’s not that Microsoft didn’t know that could happen; it’s just harder to protect against and the best option is to do it in hardware, with a special register in the CPU that keeps a copy of the return address where it can’t be tampered with. When the chunk of code with the return instruction runs, it can compare the address on the call stack in memory with the address on the ‘shadow’ stack stored on the processor to check that it hasn’t been tampered with.

Designing new CPU instructions takes time, and even once those ship it takes a while before people buy new hardware, so Microsoft did attempt to create a shadow stack in software. (This was far from the first attempt to create a shadow stack; there’s one implemented in CLANG that Chrome used for a time.) Unusually, the approach which would have become Return Flow Guard was designed not by the usual software engineers but by the Windows red team — the group that attacks internal and insider builds of Windows to look for vulnerabilities. But when the same team looked at how they could attack their own design, they found a race condition that meant some apps weren’t protected and decided not to ship it at all.

“The challenge with doing a shadow stack in software is that you have two choices: you can try to hide it, or you can try to put it in a place where the attacker can’t write, and ultimately that comes down to if you can modify the page table or if you can locate it in memory if things go awry,” Weston explained. “We attempted to hide it somewhere in 64-bit memory by wrapping it in guard pages, so if someone did like an iterative search through memory they would hit a guard space first and crash the process before finding the shadow stack.” But on high-performance multi-threaded apps, attackers could sometimes make the kernel skip over the check to see if the return address matched the address on the shadow stack.

“When we have to do it in software, we have to introduce ‘no ops’; when you’re entering and exiting the function, we pad them with blanks and so people are able to massage the memory, people are able to massage the race conditions of the system and skip the checks completely,” Hari Pulapaka, principal group program manager of the Windows kernel team, explained. There’s no race condition when the shadow stack is stored in hardware, so the checks don’t get skipped.

Microsoft and Intel worked together on a design called Control-flow Enforcement Technology (CET) several years ago, which adds the new Shadow Stack Pointer (SSP) register and modifies the standard CPU call and return instructions to store a copy of the return address and compare it to the one in memory — so most programs won’t need any changes for compatibility. If the two addresses don’t match, which means the stack has been interfered with, the code will stop running.

“The shadow page table is assigned in a place that most processes or even the kernel cannot access, and this is supported by a new page table attribute that is not even exposed right now and people can’t query it either,” Pulapaka said. “The idea is that you will not be able to see that it exists, and you will not be able to touch it — and if you try to touch it, the kernel doesn’t allow it to allow any arbitrary process to touch it.”

SEE: 20 pro tips to make Windows 10 work the way you want (free PDF) (TechRepublic)

CET also includes some forward call protection: indirect branch tracking does a similar check to CFG but in hardware. The CET specification was first released in 2016 and for compatibility, silicon released since then has had a non-functional version of the instruction that marks indirect branch addresses as safe.

Intel confirmed to us that CET will be included in Tiger Lake CPUs this year, and in the next generation of Xeon for servers. AMD didn’t give a date, but told us it will have the equivalent of CET soon. Arm is taking a different approach, using signed pointers.

Compatible and secure

Microsoft has already started building support into Windows 10, starting with 1903 and completing it in the upcoming 2004 release, so it’s been showing up in fast ring insider builds. It’s not enabled because the hardware isn’t widely available yet, but it’s there to test compatibility, Pulapaka explained. “When an insider build has all these checks going on inside the kernel, it gives us confidence we haven’t broken anything and we haven’t caused any bugs.”

To avoid compatibility worries with third-party software, CET stack protection will initially be opt-in on Windows. Developers do that by setting an attribute on an app or a DLL with a linker flag to mark it as CET-compatible. This has been done for all Windows code and libraries and, Pulapaka explained, “if somebody tries to attack Windows code and we trip the CET tripwire, we will bring down the process.”

If they don’t set that bit, CET won’t kick in, and even if developers set the bit for their own code, if they call a third-party framework or library that doesn’t have the CET flag set and it crashes because it fails the CET address check, Windows won’t stop the original application.

“We’re being a little conservative to avoid breaking apps,” Pulapaka said. But Windows could also run in a strict mode. “If an app says it’s CET-compatible even if the third-party DLL it loads is not CET-compatible, in that mode we would still do all the checks on that DLL and crash the process if somebody tries to attack that process.”

Microsoft hasn’t yet decided how that mode would be applied because hardware isn’t available for developers and enterprises to test applications on. “We would want to provide flexibility to everybody, so we would want the app to own the policy decision, we would want the enterprise to own the policy decision and we would want Microsoft to own the policy decision as well,” said Pulapaka. “I think it is too early for us to say what we would turn on or off or force by default, because we don’t yet have the hardware.”

Pulapaka expects compatibility problems with CET to be rare, but given the size of the Windows ecosystem some apps may run into problems. Those are most likely to be sophisticated tools like debuggers, JIT code generation tools, DRM, code obfuscators or anti-cheat engines for games, that rely on low-level assembly code.

“If they have some weird code that tries to mess with the stack pointers, they could get tripped up. That’s why we want to start with this more conservative approach and see how it goes; ninety-nine percent of the software world would probably not need to worry about whether their apps need some extra special testing with CET.”

When developers and enterprises have the right hardware to test on and do want to adopt CET, they can set the linker file in Visual Studio and use the same binary analysis tool that Microsoft uses to scan each Windows build to make sure that the CET flag is set on all code.

Protecting code flow in hardware is the best option for security, and it ought to be better for performance than adding checks in Windows. Until Tiger Lake is available, it’s impossible to give real figures but “it will certainly be way better than doing it in software because by definition, doing it hardware is much faster,” Pulapaka told TechRepublic.

That’s important because the shadow stack is an important protection that we’ve been waiting several years for, to complete the list of Microsoft’s four code protections.

“These things are only truly effective when they’re combined,” Weston pointed out, “but when those protections are combined, we mitigate most of the in-the-wild techniques we see today. When it comes to the x86 landscape, we think CET is possibly the most important mitigation that’s come online for memory corruption and zero day exploits, in the last several years.”

As always, improving protection in one area pushes attackers to switch techniques — but this is still a big step forward.

“Data corruption is emerging as the future path for attackers: we know internally that you can write an exploit that bypasses all four of these guards with pure data corruption,” Weston said. “That doesn’t mean CET isn’t incredibly valuable, because that’s a bit like open heart surgery and is going to be really disruptive for attackers, but we’re already moving towards a post four-guards world where we’ve started to think about the next set of challenges around data corruption.”

Additional resources

Subscribe to the Cybersecurity Insider Newsletter

Strengthen your organization's IT security defenses by keeping abreast of the latest cybersecurity news, solutions, and best practices. Delivered every Monday, Tuesday and Thursday

Subscribe to the Cybersecurity Insider Newsletter

Strengthen your organization's IT security defenses by keeping abreast of the latest cybersecurity news, solutions, and best practices. Delivered every Monday, Tuesday and Thursday