Applying Feedback Control to a Replica Management System
Many modern storage systems used for large-scale scientific systems are multiple use, independently administrated clusters or grids. A common technique to gain storage reliability over a long period of time is the creation of data replicas on multiple servers, but in the presence of server failures, ongoing corrective action must be taken to prevent the loss of high value and low value data. Such a system is difficult to control, and replica management is typically handled in an ad hoc manner. In this work, the authors claim that repairing prioritized faults is a scheduling problem, founded on the need to minimize a risk-based error function, E.