Just-in-Time Staging of Large Input Data for Supercomputing Jobs
High performance computing is facing a data deluge from state-of-the-art colliders and observatories. Large data-sets from these facilities, and other end-user sites, are often inputs to intensive analyses on modern supercomputers. Timely staging in of input data at the supercomputer's local storage can not only optimize space usage, but also protect against delays due to storage system failures. To this end, the authors propose a just-in-time staging framework that uses a combination of batch-queue predictions, user-specified intermediate nodes, and decentralized data delivery to coincide input data staging with job startup.