Towards Foundation Database Models
Abstract
Recently, machine learning models have been utilized to realize many database tasks in academia and industry. To solve such internal tasks of database systems, the state-of-the-art is one-off models that need to be trained individually per task and even per dataset, which causes extremely high training overheads. In this paper, we argue that a new learning paradigm is needed that moves away from such one-off models towards generalizable models that can be used with only minimal overhead for an unseen dataset on a wide spectrum of tasks. While recently, several advances towards more generalizable models have been made, still no model exists that can generalize across both datasets and tasks. As such, we propose a new direction which we call foundation models for databases which is pre-trained in both task-agnostic and dataset-agnostic manner which makes it possible to use the model with low overhead to solve a wide spectrum of downstream tasks on unseen datasets. In this vision paper, we propose an architecture for such a foundation database model, describe a promising feasibility study with a first prototype of such a model, and discuss the research roadmap to address the open challenges.