Task Reallocation for Maximal Reliability in Distributed Computing Systems With Uncertain Topologies and Non-Markovian Delays

Free registration required

Executive Summary

The ability to model and optimize reliability is central in designing survivable Distributed Computing Systems (DCSs) where servers are prone to fail permanently. In this paper the service reliability of a DCS in uncertain topologies is analytically characterized by using a novel regeneration-based probabilistic analysis. The analysis takes into account the stochastic failure times of servers, the heterogeneity and randomness of both service times and communication delays, as well as arbitrary task-reallocation policies.

  • Format: PDF
  • Size: 343.8 KB