go back

Volume 14, No. 4

Compact, Tamper-Resistant Archival of Fine-Grained Provenance

Authors:
Nan Zheng (University of Pennsylvania), Zack Ives (University of Pennsylvania)

Abstract

Data provenance tools aim to facilitate reproducible data science and auditable data analyses, by tracking the processes and inputs responsible for each result of an analysis. Fine-grained provenance further enables sophisticated reasoning about why individual output results appear or fail to appear — aiding debugging and diagnosis. However, for provenance to be truly useful for reproducibility and auditing, we need a provenance archival system that ensures it is tamper-resistant, and that storing provenance collected over many queries and across time is efficient (i.e., it compresses repeated results). In this paper we study this problem, developing solutions for storing fine-grained provenance in relational storage systems while both compressing and protecting it via cryptographic hashes. We experimentally validate our proposed solutions using a variety of workloads based on scientific and OLAP workloads.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy