Communication-Aware Fault-Tolerant Scheduling Strategy for Precedence Constrained Tasks in Heterogeneous Distributed Systems
Fault-tolerant scheduling is an important issue for optimal heterogeneous distributed systems because of a wide range of resource failures. Primary-backup approach is a common methodology used for fault tolerance wherein each task has a primary copy and a backup copy on two different processors. For independent tasks, the backup copy can overload with other backup copies on the same processor, as long as their corresponding primary copies are scheduled on different processors. Unfortunately, most of the scheduling algorithms developed on a simple model where communication contention is not taken into account.