SageDB: An Instance-Optimized Data Analytics System

Authors:

Jialin Ding, Ryan C Marcus, Andreas Kipf, Vikram Nathan, Aniruddha Nrusimha, Kapil Vaidya, Alexander Van Renen, Tim Kraska

Download PDF

Abstract

Modern data systems are typically both complex and general-purpose. They are complex because of the numerous internal knobs and parameters that users need to manually tune in order to achieve good performance; they are general-purpose because they are designed to handle diverse use cases, and therefore often do not achieve the best possible performance for any specific use case. A recent trend aims to tackle these pitfalls: instance-optimized systems are designed to automatically self-adjust in order to achieve the best performance for a specific use case, i.e., a dataset and query workload. Thus far, the research community has focused on creating instance-optimized database components, such as learned indexes and learned cardinality estimators, which are evaluated in isolation. However, to the best of our knowledge, there is no complete data system built with instance-optimization as a foundational design principle. In this paper, we present a progress report on SageDB, our effort towards building the first instance-optimized data system. SageDB synthesizes various instance-optimization techniques to automatically specialize for a given use case, while simultaneously exposing a simple user interface that places minimal technical burden on the user. Our prototype outperforms a commercial cloud-based analytics system by up to 3× on end-to-end query workloads and up to 250× on individual queries. SageDB is an ongoing research effort, and we highlight our lessons learned and key directions for future work.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 15, No. 13

SageDB: An Instance-Optimized Data Analytics System

Abstract