Servers Investigate

Behavior-Based Problem Localization for Parallel File System

Download now Free registration required

Executive Summary

The authors present a behavior-based problem-diagnosis approach for PVFS that analyzes a novel source of instrumentation - CPU instruction-pointer samples and function-call traces - to localize the faulty server and to enable root-cause analysis of the resource at fault. They validate the approach by injecting realistic storage and network problems into three different workloads (dd, IOzone, and PostMark) on a PVFS cluster. Large scientific applications exhibit compute-intense behavior intermixed with periods of intense parallel I/O, and therefore, depends on file systems that support high-bandwidth concurrent writes. The Parallel Virtual File System (PVFS) is an open-source, parallel file systems that provides such applications with high-speed data access to files. PVFS has client-server architecture, with many clients communicating with multiple I/O servers and one or more metadata servers.

  • Format: PDF
  • Size: 148.89 KB