Researchers from NICTA (National ICT Australia), Australia’s Information and Communications Technology Centre of Excellence, are attempting to remove errors from software:
“Almost every paper on formal verification starts with the observation that software complexity is increasing, that this leads to errors, and that this is a problem for mission and safety critical software. We agree, as do most.”
My first thought: No way. How can you eliminate every single error? Some of the more complex software packages must have at least a zillion lines of code.
The NICTA people are well aware of the complexity. That’s why they’re focusing on small embedded system software to start with.
Every time my father calls with a computer problem, he points out that it’s criminal how we accept buggy software: “It’ll be a cold day in Hell before I’d buy a car like that.”
Fast-forward to a recent phone call from my dad. I didn’t know whether to laugh or cry. His car was recalled for a software update.
NICTA researchers mentioned:
“It does not have to be that way. Your company is commissioning a new vending software…. You write down in a contract precisely what the software is supposed to do.
And then–it does. Always. And the developers can prove it to you–with an actual mathematical machine-checked proof.”
Sounds too good to be true, doesn’t it? Still, academics do not make unsubstantiated claims. So, what do the NICTA team members have up their collective sleeves? I contacted NICTA, and Dr. June Andronick volunteered to explain what they had learned.
Kassner: How did you first get interested in code verification? And is it as daunting as it seems?
Andronick: I come from a math background. I started writing code with the mindset where you think about why your program should behave as you expect. For instance, “Why exactly should it terminate?”
I found formal code verification a fascinating way of combining the two worlds…writing precisely what you expect from the program…and then proving that it does.
Some of this is intuitive: Programmers usually have the gut feeling of why their program does what they want. Code verification just formalises this reasoning, and has it machine-checked, leaving no place for doubt.
It may sound daunting, but it is actually a lot of fun and addictive. It’s like a game between you and the proof tool — you trying to convince it why you think something is true.
Kassner: You used something called the seL4 microkernel to test your theories. Why did you select this kernel?
Andronick: The goal of the L4.verified project, led by Professor Gerwin Klein, was to formally verify seL4, a new microkernel designed by Dr. Kevin Elphinstone. It is the first step towards the long-term vision of our group, headed by Professor Gernot Heiser, to produce truly trustworthy software for which you can provide strong guarantee about its security, safety, and reliability.
The choice to tackle the kernel first is driven by one main reason: It is the most critical part of the system, residing between the hardware and the rest of the software. It has full access to all resources and controls what other software can access.
Any behavior of the system will rely on all aspects of kernel functionality. So any guarantee about the system will have to start with the kernel being functionally correct.
Because there is no protection against faults occurring within the kernel, any bug there can potentially cause arbitrary damage. The concept of microkernel comes from reducing the amount of such critical code to a minimum, reducing the “trusted computing base”.
The result of the project was a formal proof of seL4’s functional correctness. This means that, under the assumptions of the proof, the kernel can never crash or otherwise behave unexpectedly.
This was the first formal proof of functional correctness of a general purpose operating system kernel, and more generally of any real-world application of significant size.
Kassner: 8700 lines of C and 600 lines of assembler seem like a lot of code to check. Is that what the following slide depicts? Could you please explain what we are looking at?
Andronick: This picture shows seL4’s so-called function call graph. Each C function in the kernel is a dot. If function A calls function B, there is an arrow from A to B.
The graph shows that seL4 is highly complex software with strongly interconnected parts. This is typical for performance-optimized microkernels.
Well-engineered application-level software would have groups or islands of strongly related dots connected by a small number of arrows bridging between the islands.
Kassner: You use the terms formal verification and functional correctness. Could you describe what they each mean and what roles they play?
Andronick: Formal veriﬁcation refers to the application of mathematical proof techniques to establish properties about programs. It can cover not just all lines of code or all decisions in a program, but all possible behaviours for all possible inputs.
This exhaustive coverage of all possible scenarios is what differentiates it from testing, which can only find bugs, not prove the absence of bugs.
Functional correctness means that the kernel behaves as expected in its specification. This property is stronger and more precise than what automated techniques like model checking or static analysis can achieve.
We don’t only analyse speciﬁc aspects of the kernel, such as safe execution, but also provide a full speciﬁcation and proof for the kernel’s precise behaviour.
The approach we use is interactive theorem proving. It requires human intervention and creativity to construct and guide the proof, but has the advantage that it is not constrained to speciﬁc properties or ﬁnite, feasible state spaces.
Kassner: I now understand what you are looking for. Could you give a brief overview of how you try to find problems?
Andronick: A problem is detected when the code does not behave as the specification prescribes.
The process starts by writing a formal specification of what the kernel is supposed to do. For instance, you can require that a sorting function returns a list that is sorted, with the same elements than the input list. The code will describe how this functionality is implemented, as choosing one sorting algorithm.
Then you need to prove that the result of the function will always satisfy the specification requirement. Again, the key differentiator to testing is that we reason about all possible inputs. If the specification does not hold for some inputs, the proof will reveal it. The bug will be fixed and the proof attempt will resume. When the proof is finished, you know that there are no implementation bugs left.
Kassner: What is the next step for the verification process? Can this procedure be used to test computer-operating systems we are used to?
Andronick: The exact same approach won’t scale to systems comprising a million lines of code such as modern operating systems. But, we have a plan for large complex systems.
The first thing to note is that formal verification is not cheap. We spent signiﬁcant eﬀort on the veriﬁcation of about 10,000 lines of code. It still appears cost-eﬀective and more aﬀordable than other methods that achieve lower degrees of trustworthiness.
The main message is that such thorough methods really make sense in critical systems (medical, automotive, defense, etc).
Our approach for large critical systems comes from the observation that in such systems, not all of the code is critical. For example, medical devices have large user interfaces and airplanes have entertainment software (Prof. Heiser wrote a relevant blog illustrating the need for a verified kernel using in-flight entertainment systems as an example).
The key idea of our current work is to isolate the non-critical parts from the critical ones. This is done by using seL4 and its access control. This enables us to prove that a bug in the non-critical parts cannot prevent the critical parts from behaving correctly. This way we can concentrate the verification effort on the critical parts of the system.
In other words, we show that formal verification of functional correctness is practically achievable for OS microkernels, and we’re now working on using this trustworthy foundation to give strong guarantees about large critical systems.
Kassner: Funny you should mention entertainment software for airlines. While on a recent red-eye to Amsterdam, the cabin attendants were running up and down the aisles, apologizing. The entertainment system was down. “It should be up soon.” They said, “The captain is rebooting a computer.”
My burning question: “What else does that computer control?” I was assured that everything was under control.