Themis: A GPU-accelerated Relational Query Execution Engine

Authors:

Kijae Hong, Kyoungmin Kim, Young-Koo Lee, Yang-Sae Moon, Sourav S Bhowmick, Wook-Shin Han

Download PDF

Abstract

GPU-accelerated relational query execution engines have paral-GPU-accelerated relational query execution engines have parallelized the execution of a pipeline, a sequence of operators. For the parallelization, the engines evenly partition the tuples in a table that will be scanned by the pipeline’s first operator (a scan), and each thread executes the pipeline for the tuples in a partition. However, this approach leads to load imbalances since an operator returns a varying number of output tuples per input tuple, particularly under non-uniform data distributions such as skewed join key val-under non-uniform data distributions such as skewed join key values. The load imbalances are classified into intra- and inter-warp load imbalances (intra-WLIs and inter-WLIs) since 1) threads are grouped into warps and 2) every thread in a warp evaluates the same operator for an input tuple concurrently following a single-same operator for an input tuple concurrently following a singleinstruction-multiple-thread manner. In contrast, threads in different warps can evaluate different operators concurrently. Although load balancing techniques have been proposed, however, they fail to solve the load imbalances on various workloads. In this paper, we propose a query execution engine, Themis, named after the deity of fairness, which symbolizes balanced workloads within our context. Themis minimizes intra-WLIs and inter-WLIs across various work-Themis minimizes intra-WLIs and inter-WLIs across various workloads. First, Themis minimizes intra-WLIs by redistributing tuples between the threads in a warp and making the threads evaluate an operator only when all of them hold inputs. Second, Themis mitigates the inter-WLIs by redistributing the tuples of warps with heavy workloads to idle warps. To check whether a warp’s work-heavy workloads to idle warps. To check whether a warp’s workload is heavy, we propose a method to approximate the sizes of warps’ workloads. Based on these approximations, Themis adap-warps’ workloads. Based on these approximations, Themis adaptively adjusts the threshold for determining a warp’s workload as heavy. In a recent benchmark JCC-H, which introduces skewed join key distributions to TPC-H, Themis significantly alleviates the inter-WLIs and intra-WLIs, outperforming the runner-up by up to 379x.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 18, No. 2

Themis: A GPU-accelerated Relational Query Execution Engine

Abstract