Data Centers

Mixing Hadoop and HPC Workloads on Parallel Filesystems

Download Now Date Added: Nov 2009
Format: PDF

MapReduce-tailored distributed filesystems - such as HDFS for Hadoop MapReduce - and parallel high-performance computing filesystems are tailored for considerably different workloads. The purpose of the authors' work is to examine the performance of each filesystem when both sorts of workload run on it concurrently. They examine two workloads on two filesystems. For the HPC workload, they use the IOR check-pointing benchmark and the Parallel Virtual File System, Version 2 (PVFS); for Hadoop, they use an HTTP attack classifier and the CloudStore filesystem. They analyze the performance of each file system when it concurrently runs its "Native" workload as well as the non-native workload.