Hardware-Oblivious SIMD Parallelism for In-Memory Column-Stores
Abstract
Vectorization based on the Single Instruction Multiple Data (SIMD) parallel paradigm is a core technique to improve query processing performance especially in state-of-the-art in-memory column-stores. In mainstream CPUs, vectorization is offered by a large number of powerful SIMD extensions growing not only in vector size but also in terms of complexity of the provided instruction sets. However, programming with vector extensions is a non-trivial task and currently accomplished in a hardware-conscious way. This implementation process is not only error-prone but also connected with quite some effort for embracing new vector extensions or porting to other vector extensions. To overcome that, we present a Template Vector Library (TVL) as a hardware-oblivious concept in this paper. We will show that our single source hardware-oblivious implementation runs efficiently on different SIMD extensions as well as on a pure vector engine. Moreover, we demonstrate that several new optimization opportunities are possible, which are difficult to realize without a hardware-oblivious approach.