A Model for Query Execution Over Heterogeneous Instances
Abstract
Decreasing VM start-up times and recent trends in serverless computing on public clouds now allow users to spin up a dedicated cluster for an SQL query, in contrast to the longstanding paradigm of submitting queries to a fixed cluster. This new capability places additional responsibility on the query planner, which now must recommend the most cost-efficient cluster configuration for a given query. While the space of potential cluster configurations is immense, recent work has mostly focused on homogeneous clusters that consist of only one instance type. We argue that to truly leverage the flexibility of public clouds, we need to consider heterogeneous clusters consisting of multiple instance types. In this paper, we present a framework to gauge the optimality of cluster configurations, an intuitive model showcasing how heterogeneity leads to performance improvements for pipelined joins, and preliminary experimental evidence supporting the model.