Does Erasure Coding Have a Role to Play in My Data Center?
Today replication has become the de facto standard for storing data within and across data centers that process data-intensive workloads. Erasure coding (A form of software RAID), although heavily researched and theoretically more space efficient than replication, has complex tradeoffs which are not well-understood by practitioners. Today's data centers have diverse foreground and background data-intensive workloads, and getting these tradeoffs right is becoming increasingly important. Through a series of realistic data center deployment scenarios and workload characteristics, coupled with the implementation of a prototype Hadoop library with erasure codec functionalities, the authors revisit traditional metrics (Performance and dollar cost), present new tradeoffs (Power proportionality and complexity) and make recommendations on directions worth researching.