Provenance for Generalized Map and Reduce Workflows

The authors consider a class of workflows, which they call Generalized Map and Reduce Workflows (GMRWs), where input data sets are processed by an acyclic graph of map and reduce functions to produce output results. The authors show how data provenance (also sometimes called lineage) can be captured for map and reduce functions transparently. The captured provenance can then be used to support backward tracing (finding the input subsets that contributed to a given output element) and forward tracing (determining which output elements were derived from a particular input element).

Provided by: Stanford University Topic: Enterprise Software Date Added: Jan 2011 Format: PDF

Find By Topic