A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems
Many scientific workflows are data intensive where a large volume of intermediate data is generated during their execution. Some valuable intermediate data need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science on cloud has become popular nowadays, more intermediate data can be stored in scientific cloud workflows based on a pay-for-use model. In this paper, the authors build an Intermediate Data dependency Graph (IDG) from the data provenances in scientific workflows.