Latent Fault Detection in Cloud Services

Large scale internet cloud services comprising of thou-sands of computers are ubiquitous. With so many machines, it is not reasonable to assume that all of them are working properly and are well configured. If faults are left unnoticed they might accumulate to the point where redundancy and fail-over mechanisms break. Therefore, detecting latent faults is essential for preventing failures and increasing the reliability of cloud services. This paper provides evidence that latent faults are common. The authors show that these faults can be detected using domain independent techniques, and with high precision.

Provided by: Microsoft Research Topic: Cloud Date Added: Jul 2011 Format: PDF

Find By Topic