Improving Scalability and Fault Tolerance in an Application Management Infrastructure
This paper explores the challenges associated with distributed application management in large-scale computing environments. In particular, the authors investigate several techniques for extending Plush, an existing distributed application management framework, to provide improved scalability and fault tolerance without sacrificing performance. One of the main limitations of Plush is the structure of the underlying communication fabric. They explain how they incorporated the use of an overlay tree provided by Mace, a toolkit that simplifies the implementation of overlay networks, in place of the existing communication subsystem in Plush to improve robustness and scalability.