go back

Volume 15, No. 3

Provenance-based Data Skipping

Authors:
Xing Niu (Illinois Institute of Technology)* Boris Glavic (Illinois Institute of Technology) Ziyu Liu (Illinois institute of thechnology) Pengyuan Li (Illinois institute of thechnology) Dieter Gawlick (Oracle) Vasudha Krishnaswamy (Oracle, USA) Zhen Hua Liu (Oracle) Danica Porobic (Oracle)

Abstract

Database systems analyze queries to determine upfront which data is needed for answering them and use indexes and other physical design techniques to speed-up access to that data. However, for important classes of queries, e.g., HAVING and top-k queries, it is impossible to determine up-front what data is relevant. To overcome this limitation, we develop provenance-based data skipping (PBDS), a novel approach that generates provenance sketches to concisely encode what data is relevant for a query. Once a provenance sketch has been captured it is used to speed up subsequent queries. PBDS can exploit physical design artifacts such as indexes and zone maps. Our approach significantly improves performance for both disk- based and main-memory database systems.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy