Putting Lipstick on Pig: Enabling Database-Style Workflow Provenance
Workflow provenance typically assumes that each module is a "Black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of the inputs (finegrained dependencies) as well as on the internal state of the module. The authors present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in their solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow provenance.