Bolt-on, Compact, and Rapid Program Slicing for Notebooks [Scalable Data Science]

Authors:

Shreya Shankar, Stephen Macke, Sarah Chasins, Andrew Head, Aditya Parameswaran

Abstract

Computational notebooks are commonly used for iterative workflows, such as in exploratory data analysis. This process lends itself to the accumulation of old code and hidden state, making it hard for users to reason about the lineage of, e.g., plots depicting insights or trained machine learning models. One way to reason about code used to generate various notebook data artifacts is to compute a program slice, but traditional static approaches to slicing can be both inaccurate (failing to contain relevant code for artifacts) and conservative (containing unnecessary code for an artifacts). We present nbslicer, a dynamic slicer optimized for the notebook setting whose instrumentation for resolving dynamic data dependencies is both bolt-on (and therefore portable) and switchable (allowing it to be selectively disabled in order to reduce instrumentation overhead). We demonstrate nbslicer’s ability to construct small and accurate backward slices (i.e., historical cell dependencies) and forward slices (i.e., cells affected by the "rerun" of an earlier cell), thereby improving reproducibility in notebooks and enabling faster reactive re-execution, respectively. Comparing nbslicer with a static slicer on 374 real notebook sessions, we found that nbslicer filters out far more superfluous program statements while maintaining slice correctness, giving slices that are, on average, 66% and 54% smaller for backward and forward slices, respectively.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 15, No. 13

Bolt-on, Compact, and Rapid Program Slicing for Notebooks [Scalable Data Science]

Abstract