CIDR Proceedings

This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

Go Back

SageDB: A Learned Database System

Authors:

Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, Vikram Nathan

Download PDF

Abstract

Modern data processing systems are designed to be general purpose, in that they can handle a wide variety of diﬀerent schemas, data types, and data distributions, and aim to provide eﬃcient access to that data via the use of optimizers and cost models. This general purpose nature results in systems that do not take advantage of the characteristics of the particular application and data of the user. With SageDB we present a vision towards a new type of a data processing system, one which highly specializes to an application through code synthesis and machine learning. By modeling the data distribution, workload, and hardware, SageDB learns the structure of the data and optimal access methods and query plans. These learned models are deeply embedded, through code synthesis, in essentially every component of the database. As such, SageDB presents radical departure from the way database systems are currently developed, raising a host of new problems in databases, machine learning and programming systems.