Composable Data Management: An Execution Overview

Authors:

Pedro Pedreira, Deepak Majeti, Orri Erling

Abstract

The trend of decomposing monolithic data management systems into a stack of reusable components has quickly gained momentum across the industry. Although a series of open-source projects have emerged targeting different layers of the stack, execution engines are of special importance due to the complexity they encapsulate, and the demand to optimize price-performance. In this tutorial, we will survey the space of composability in data management, focusing on the execution layer. We will discuss the main APIs, integration with existing and novel data management systems, and how specialized behavior can be accommodated by using extensibility APIs. With an emphasis on analytics, we will take a deeper dive into performance, discussing modern aspects of vectorization, compressed (encoding-aware) execution, and adaptivity. While the presentation is contextualized using real-world examples and experience while developing the Velox open-source execution engine and integrations with existing systems like Presto (Prestissimo) and Spark (Gluten), the concepts and techniques discussed are generally applicable to other execution engines. Finally, we will discuss future trends and ongoing work regarding novel file formats, compressed execution opportunities, and nascent hardware acceleration efforts, highlighting current challenges and open questions. With a survey of the state-of-the-art in this space, we hope this tutorial will help motivate individuals and organizations to embrace composability and promote collaborations across related projects.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 17, No. 12

Composable Data Management: An Execution Overview

Abstract