Latent Fault Detection in Cloud Services
Large scale internet cloud services comprising of thou-sands of computers are ubiquitous. With so many machines, it is not reasonable to assume that all of them are working properly and are well configured. If faults are left unnoticed they might accumulate to the point where redundancy and fail-over mechanisms break. Therefore, detecting latent faults is essential for preventing failures and increasing the reliability of cloud services. This paper provides evidence that latent faults are common. The authors show that these faults can be detected using domain independent techniques, and with high precision.