A Hybrid Approach for Efficient Provenance Storage
Efficient provenance storage is an essential step towards the adoption of provenance. In this paper, the authors analyze the provenance collected from multiple workloads with a view towards efficient storage. Based on their analysis, they characterize the properties of provenance with respect to long term storage. They then propose a hybrid scheme that takes advantage of the graph structure of provenance data and the inherent duplication in provenance data. Their evaluation indicates that their hybrid scheme, a combination of web graph compression (adapted for provenance) and dictionary encoding, provides the best tradeo in terms of compression ratio, compression time and query performance when compared to other compression schemes.