Reconciling Scratch Space Consumption, Exposure, and Volatility to Achieve Timely Staging of Job Input Data
Innovative scientific applications and emerging dense data sources are creating a data deluge for high-end computing systems. Processing such large input data typically involves copying (or staging) onto the supercomputer's specialized high-speed storage, scratch space, for sustained high I/O throughput. The current practice of conservatively staging data as early as possible makes the data vulnerable to storage failures, which may entail re-staging and consequently reduced job throughput. To address this, the authors present a timely staging framework that uses a combination of job startup time predictions, user-specified intermediate nodes, and decentralized data delivery to coincide input data staging with job start-up.