This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

The 3D Hash Join: Building On Non-Unique Join Attributes

Authors:
Daniel Flachs, Magnus Müller, Guido Moerkotte
Abstract

One of the most prominent ways to evaluate an equi-join is based on hashing. We consider the problem of non-unique join attributes on the build side. In conventional hash tables where collisions are resolved by chaining, duplicates inevitably lead to long collision chains. This causes a high number of expensive main memory accesses and join predicate evaluations during the probe phase, increasing the runtime of the overall join. A related problem occurs when the query optimizer cannot determine the uniqueness of the join attribute of the build side. We present the 3D Hash Join to efficiently evaluate main-memory hash joins in the presence of duplicate build keys and skew. The main idea is to cluster the hash table collision chains based on the distinct values of the build attribute. We further introduce a technique called deferred unnesting to speed up the evaluation of multiple joins. In an experimental comparison with an implementation of a chaining hash table, our approach achieves a speedup of up to a factor of 3.53 for a single key/foreign key join and 5.67 for many-to-many joins.