Date Added: Jun 2011
Upgrading the software of long-lived, highly-available distributed systems is difficult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable and halting the system for an upgrade is unacceptable. Instead, upgrades may happen gradually, and there may be long periods of time when different nodes are running different software versions and need to communicate using incompatible protocols. The authors present a methodology and infrastructure that address these challenges and make it possible to upgrade distributed systems routinely while limiting service disruption.