Efficient Approximate Query Processing with Block Sampling
Abstract
Approximate query processing (AQP) has been widely studied to accelerate online analytical query processing while maintaining high accuracy. Many existing methods focus on reducing data processing costs through record-level sampling techniques. However, since data systems typically access data in pages, these methods can cause data loading costs as high as exact queries, often becoming the bottleneck of query processing. In this work, we present B-AQP, an AQP framework based on block sampling, significantly reducing data loading costs while guaranteeing a priori errors. Our preliminary evaluation across various data systems and workloads demonstrates that B-AQP accelerates query execution by up to 185× compared to uniform sampling and four orders of magnitude compared to exact queries, all with guaranteed errors.