Date Added: Mar 2012
Grid being a collection of heterogeneous resources connected through network, to execute complex jobs with high processing power requirements, is more vulnerable to faults. Faults may affect the performance and QoS of Grid. Faults are dealt with either avoiding them or recovering them by either re-execution or by resuming the execution from the point of failure by using the checkpoints. The various fault tolerance techniques use resource management, job scheduling services combined with check pointing scheme. Different techniques targets different kind of faults and have their respective advantages and limitations.