Does Erasure Coding Have a Role to Play in My Data Center?

Today replication has become the de facto standard for storing data within and across data centers that process data-intensive workloads. Erasure coding (A form of software RAID), although heavily researched and theoretically more space efficient than replication, has complex tradeoffs which are not well-understood by practitioners. Today's data centers have diverse foreground and background data-intensive workloads, and getting these tradeoffs right is becoming increasingly important. Through a series of realistic data center deployment scenarios and workload characteristics, coupled with the implementation of a prototype Hadoop library with erasure codec functionalities, the authors revisit traditional metrics (Performance and dollar cost), present new tradeoffs (Power proportionality and complexity) and make recommendations on directions worth researching.

Provided by: North Carolina State University Topic: Data Centers Date Added: May 2010 Format: PDF

Download Now

Find By Topic