Resource Constrained Failure Management in Networked Computing Systems

Free registration required

Executive Summary

The authors examine the problem of fault detection in networked computing systems and highlight the tradeoff between diagnosing/reacting to potentially harmful real-time events and minimizing the number of times the system is reset or scanned for malicious activity. The various health states of a system are modeled as states in a Markov chain, and they use a model fitting approach to estimate the transitions between these states. They proceed by considering a scenario in which a system is to be deployed over a fixed horizon but with a limit on the number of times that the health state can be scanned and the system can be reset. Each health state is assigned a cost according to the performance of the system while in that state.

  • Format: PDF
  • Size: 384.23 KB