go back

Volume 15, No. 9

New Wine in an Old Bottle: Data-aware Hash Functions for Bloom Filters

Authors:
Arindam Bhattacharya (IIT DELHI)* Chathur Gudesa (Indian Institute of Technology Delhi) Amitabha Bagchi (IIT Delhi) Srikanta Bedathur (IIT Delhi)

Abstract

In many applications of Bloom filters, it is possible to exploit the patterns present in the inserted and non-inserted keys to achieve more compression than the standard Bloom filter. A new class of Bloom filters called Learned Bloom filters use machine learning models to exploit these patterns in the data. In practice, these methods and their variants raise many practical issues: the choice of machine learning models, the training paradigm to achieve the desired results, the choice of thresholds, the number of partitions in case multiple partitions are used, and other such design decisions. In this paper, we present a simple partitioned Bloom filter that works as follows: we partition the Bloom filter into segments, each of which uses a simple projection-based hash function computed using the data. We also provide a theoretical analysis that provides a principled way to select the design parameters of our method: number of hash functions and number of bits per partition. We perform empirical evaluations of our methods on various real-world datasets spanning several applications. We show that it can achieve an improvement in false positive rates of up to two orders of magnitude over standard Bloom filters for the same memory usage, and up to 50% better compression (bytes used per key) for same FPR, and, consistently beats the variants of learned Bloom filters.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy