VIP Hashing - Adapting to Skew in Popularity of Data on the Fly

Authors:

Aarati Kakaraparthy (University of Wisconsin, Madison)* Jignesh Patel (UW - Madison) Brian Kroth (Microsoft) Kwanghyun Park (Microsoft Gray Systems Lab)

Download PDF

Abstract

All data is not equally popular. Often, some portion of data is more frequently accessed than the rest, which causes a skew in popularity of the data items. Adapting to this skew can improve performance, and this topic has been studied extensively in the past for disk-based settings. In this work, we consider an in-memory data structure, namely hash table, and show how one can leverage the skew in popularity for higher performance. Hashing is a low-latency operation, sensitive to the effects of caching, branch prediction, and code complexity among other factors. These factors make learning in-the-loop especially challenging as the overhead of performing any additional operations can be significant. In this paper we propose VIP hashing, a hash table method that uses lightweight mechanisms for learning the popularity distribution of keys and adapting to the skew in popularity on the fly, while controlling the overhead by sensing changes in the popularity distribution to dynamically switch-on/off the learning mechanism as needed. We also test VIP hashing against a variety of workloads generated by Wiscer, a homegrown hashing measurement tool, and find that it improves performance in the presence of skew (22% increase in fetch operation throughput for a hash table with one million entries under low skew) while also being robust in the presence of inserts, deletes, and changing popularity distribution of keys.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 15, No. 10

VIP Hashing - Adapting to Skew in Popularity of Data on the Fly

Abstract