On the Duality of Data-Intensive File System Design: Reconciling HDFS and PVFS

Data-intensive applications fall into two computing styles: Internet services (cloud computing) or High-Performance Computing (HPC). In both categories, the underlying file system is a key component for scalable application performance. In this paper, the authors explore the similarities and differences between PVFS, a parallel file system used in HPC at large scale, and HDFS, the primary storage system used in cloud computing with Hadoop. They integrate PVFS into Hadoop and compare its performance to HDFS using a set of data-intensive computing benchmarks. They study how HDFS-specific optimizations can be matched using PVFS and how consistency, durability, and persistence tradeoffs made by these file systems affect application performance.

Provided by: Association for Computing Machinery Topic: Storage Date Added: Nov 2011 Format: PDF

Find By Topic