Hist-Tree: Those Who Ignore It Are Doomed to Learn
Abstract
Learned indexes have provided a new perspective on the well-studied problem of indexing. Despite early skepticism, recent results have shown significant improvements over traditional data structures. While some have attributed these improvements to an ability to adapt to patterns in the data, we believe that the main advantage of learned indexes comes instead from implicit assumptions (e.g., data sortedness) that are not fully leveraged by the comparison points. In this paper, we show that, simply by incorporating these same basic assumptions, we can design a traditional data structure capable of outperforming learned alternatives. To make our case, we propose a new index called a histogram tree (Hist-Tree), which takes advantage of the sortedness and range of the data. We also present an optimized compact layout based on the read-only assumption made by many learned indexes. Our experiments demonstrate that, in terms of lookup latency, Hist-Tree can beat three stateof-the-art learned indexes, including RMI, PGM-index, and RadixSpline, by as much as 1.8–2.7×.