Failure Resilience for Device Drivers
Studies have shown that device drivers and extensions contain 3 - 7 times more bugs than other code and thus are more likely to fail. Therefore, the authors present a failure-resilient operating system that can recover from dead device drivers and other critical components - primarily through monitoring and replacing malfunctioning components on the fly - transparent to applications and without user intervention. This paper focuses on the post-mortem recovery procedure. They explain the working of their defect detection mechanism, the policy-driven recovery procedure, and post-restart reintegration of components.