Scalable and Reliable Systems Management for Cloud Computing
With the advent of cloud computing, massive and automated system management has become more important for successful and economical operation of computing resources. However, traditional monolithic system management solutions are designed to scale to only hundreds or thousands of systems at most. In this paper, we present Blue Eyes, a new system management solution with a multi-server scale-out architecture to handle hundreds of thousands of systems. Blue Eyes enables highly scalable and reliable system management by running many management servers in a distributed manner to collaboratively work on management tasks. In particular, we structure the management servers into a hierarchical tree to achieve scalability and management information is replicated into secondary servers to provide reliability and high availability.