Scaling your Hybrid CPU-GPU DBMS to Multiple GPUs

Authors:

Bobbi W Yogatama, Weiwei Gong, Xiangyao Yu

Download PDF

Abstract

GPU-accelerated databases have been gaining popularity in recent years due to their massive parallelism and high memory bandwidth. The limited GPU memory capacity, however, is still a major bottleneck for GPU databases. Existing approaches have attempted to address this limitation by using (1) hybrid CPU-GPU DBMS or (2) multi-GPU DBMS. We aim to improve prior solutions further by leveraging both hybrid CPU- GPU DBMS and multi-GPU DBMS at the same time. In particular, we explore the design space and optimize the data placement and query execution in hybrid CPU and multi-GPU DBMS. To improve data placement, we introduce the cache-aware replication policy which takes into account the cost of shuffle when replicating data and could coordinate both caching and replication decisions for the best performance. To improve query execution, we extend the existing hybrid CPU-GPU query execution strategy with distributed query processing techniques to support multiple GPUs. We build a system called Lancelot, a hybrid CPU and Multi-GPU data analytics engine with all the optimizations integrated. Our evaluation shows that the cache-aware replication outperforms other policies by up to 2.5× and Lancelot outperforms existing GPU DBMSes by at least 2× on Star Schema Benchmark and 12× on TPC-H Benchmark.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 17, No. 13

Scaling your Hybrid CPU-GPU DBMS to Multiple GPUs

Abstract