Total Recall: System Support for Automated Availability Management
Availability is a storage system property that is both highly desired and yet minimally engineered. While many systems provide mechanisms to improve availability - such as redundancy and failure recovery - how to best configure these mechanisms is typically left to the system manager. Unfortunately, few individuals have the skills to properly manage the trade-offs involved, let alone the time to adapt these decisions to changing conditions. Instead, most systems are configured statically and with only a cursory understanding of how the configuration will impact overall performance or availability. While this issue can be problematic even for individual storage arrays, it becomes increasingly important as systems are distributed - and absolutely critical for the wide area peer-to-peer storage infrastructures being explored.