go back

Volume 17, No. 1

Efficient Dynamic Weighted Set Sampling and Its Extension

Authors:
Fangyuan Zhang, Mengxu Jiang, Sibo Wang

Abstract

Given a weighted set 𝑆 of 𝑛 elements, weighted set sampling (WSS) samples an element in 𝑆 so that each element 𝑎_𝑖 is sampled with a probability proportional to its weight 𝑤(𝑎_𝑖). The classic alias method pre-processes an index in 𝑂(𝑛) time with 𝑂(𝑛) space and handles WSS with 𝑂(1) time. Yet, the alias method does not support dynamic updates. By minor modifications of existing dynamic WSS schemes, it is possible to achieve an expected 𝑂(1) update time and draw 𝑡 independent samples in expected 𝑂(𝑡) time with linear space, which is theoretically optimal. But such a method is impractical and even slower than a binary search tree-based solution. How to support both efficient sampling and updates in practice is still challenging. Motivated by this, we design BUS, an efficient scheme that handles an update in 𝑂(1) amortized time and draws 𝑡 independent samples in 𝑂(log 𝑛 + 𝑡) time with linear space. A natural extension of WSS is the weighted independent range sampling (WIRS), where each element in 𝑆 is a data point from R. Given an arbitrary range 𝑄 = [l, 𝑟] at query time, WIRS aims to do weighted set sampling on the set 𝑆_𝑄. We show that by integrating the theoretically optimal dynamic WSS scheme mentioned above, it can handle an update in 𝑂(log 𝑛) time and can draw 𝑡 independent samples for WIRS in 𝑂(log 𝑛 + 𝑡) time, the same as the state-of-the-art static algorithm. Again, such a solution by integrating the optimal dynamic WSS scheme is still impractical to handle WIRS queries. We further propose WIRS-BUS to integrate BUS to handle WIRS queries, which handles each update in 𝑂(log 𝑛) time and draws 𝑡 independent samples in 𝑂(log^2 𝑛 + 𝑡) time with linear space. Extensive experiments show that our BUS and WIRS-BUS are efficient for both sampling and updates.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy