Capturing Distributed Provenance Metadata from Cloud-Based Scientific Workflows
Workflows are scientific abstractions used in the modeling of scientific experiments. High performance computing environments such as clusters and grids are often required to run the experiments. Cloud computing is starting to be adopted by the scientific community. However, the cloud environment is still incipient in collecting and recording retrospective workflow provenance. This article presents an approach to capturing distributed provenance metadata from cloud-based scientific workflows. The approach was implemented through an evolution of the Matrioshka architecture that was refactored for cloud environments. Preliminary results show that provenance metadata captured from the virtual components running at the cloud can aid scientists to manage and reproduce their large scale in silico experiments.