Towards Automated Collection of Application-Level Data Provenance
Gathering data provenance at the operating system level is useful for capturing system-wide activity. However, many modern programs are complex and can perform numerous tasks concurrently. Capturing their provenance at this level, where processes are treated as single entities, may lead to the loss of useful intra-process detail. This can, in turn, produce false dependencies in the provenance graph. Using the LLVM compiler framework and SPADE provenance infrastructure, the authors investigate adding provenance instrumentation to allow intra-process provenance to be captured automatically.