A Performance Comparison of Open-Source Erasure Coding Libraries for Storage Applications

Date Added: Aug 2008
Format: PDF

Recent years have seen surge of erasure codes in preventing data loss in storage systems composed of multiple disks. Storage companies such as Cleversafe, Data Domain, Network Appliance and Panasas are offering products that use erasure codes for data availability. Technology majors like HP, IBM and Microsoft are performing active research on erasure codes for storage systems. Erasure coding is a fundamental technique to prevent data loss in storage systems composed of multiple disks. In this they present a comparison of the performance of various codes and implementations, concentrating on encoding and decoding. It is hard to draw overarching conclusions from a single performance study. One obvious, conclusion is that reducing cache misses is more important than reducing XOR operations. In this they have compared the performance of several open source erasure coding libraries. Their performance runs the gamut from slow to fast, with factors being the erasure coding technique used, optimization of the underlying coding structure and attention to cache behavior. First, the tests should be performed on multiple machines so that individual machine quirks do not impact results. Second, in the XOR codes, one may schedule the individual XOR operations in an exponential number of ways - doing so to improve cache utilization may yield further improvements in performance.