Fault Tolerant Environment Using Hardware Failure Detection, Roll Forward Recovery Approach and Microrebooting for Distributed Systems
Fault tolerant Environment is a complete programming environment for the reliable execution of distributed application programs. Fault tolerant distributed environment encompasses all aspects of modern fault tolerant distributed computing. The built-in user transparent error detection mechanism covers processor node crashes and hardware transient failures. The mechanism also integrates user-assisted error checks into the system failure model. The nucleus non-blocking check pointing mechanism combined with a novel low overhead roll forward recovery scheme delivers an efficient, low-overload backup and recovery mechanism for distributed processes.