Date Added: Mar 2012
Fault tolerance is one of the most important issues for achieving dependable distributed systems. There is a large amount of work on failure detection in network system that failures are transient. During the crash period, the server is unable to service any request. For detecting such failures, the authors are using many number of failure detectors, the objectives of the failure detectors are different. Mostly failure detectors focus on providing fast and accurate detection of al failure events. In this paper, they propose MASS, an algorithm which act as a failure detector, detects the failure and recover it. When the server get crashed the services are redirected to other active server based on minimum number of users and maximum amount of bandwidth.