Why Do Upgrades Fail And What Can We Do About It?

Free registration required

Executive Summary

Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading cause of upgrade failures. The authors propose a novel upgrade-centric fault model, based on data from three independent sources, which focuses on the impact of procedural errors rather than software defects. The show that current approaches for upgrading enterprise systems, such as rolling upgrades, are vulnerable to these faults because the upgrade is not an atomic operation and it risks breaking hidden dependencies among the distributed system-components. The authors also present a mechanism for tolerating complex procedural errors during an upgrade.

  • Format: PDF
  • Size: 834.8 KB