DiskReduce: RAID for Data-Intensive Scalable Computing

Free registration required

Executive Summary

Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAID, organizations. DiskReduce is a modification of the Hadoop Distributed File System (HDFS) enabling asynchronous compression of initially triplicated data down to RAID-class redundancy overheads.

  • Format: PDF
  • Size: 301.2 KB