Provenance for Generalized Map and Reduce Workflows
Source: Stanford University
The authors consider a class of workflows, which they call Generalized Map and Reduce Workflows (GMRWs), where input data sets are processed by an acyclic graph of map and reduce functions to produce output results. The authors show how data provenance (also sometimes called lineage) can be captured for map and reduce functions transparently. The captured provenance can then be used to support backward tracing (finding the input subsets that contributed to a given output element) and forward tracing (determining which output elements were derived from a particular input element).