go back
go back
Volume 15, No. 2
Origami: A High-Performance Mergesort Framework
Abstract
Mergesort is a popular kernel for real-world workloads as it is immune to data skewness, supports streaming operations, and is relatively less complex to parallelize. We introduce Origami, a mergesort framework that presents a four-phase pipeline and develops optimized solutions for each phase that are generalizable to scalar and all SIMD CPU architectures. Within the same extension set, we present the fastest in-register method for sorting small sequences and a branchless merger of sorted lists that achieves upto $1.5\times$ speedup over the naive merge. In addition, we introduce a cache residing quad merge tree to avoid bottlenecking on memory bandwidth and a parallel partitioning scheme to maximize thread level parallelism. We develop the end-to-end sort with these components and produce a highly utilized pipeline by reducing synchronization overheads between phases. Origami single thread mergesort performs upto $2\times$ faster than the closest competitor and achieves a nearly perfect speedup in multi-core environments.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy