go back
go back
Volume 14, No. 4
Compact, Tamper-Resistant Archival of Fine-Grained Provenance
Abstract
Data provenance tools aim to facilitate reproducible data science and auditable data analyses, by tracking the processes and inputs responsible for each result of an analysis. Fine-grained provenance further enables sophisticated reasoning about why individual output results appear or fail to appear — aiding debugging and diagnosis. However, for provenance to be truly useful for reproducibility and auditing, we need a provenance archival system that ensures it is tamper-resistant, and that storing provenance collected over many queries and across time is efficient (i.e., it compresses repeated results). In this paper we study this problem, developing solutions for storing fine-grained provenance in relational storage systems while both compressing and protecting it via cryptographic hashes. We experimentally validate our proposed solutions using a variety of workloads based on scientific and OLAP workloads.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy