go back

Volume 14, No. 11

SKT: A One-Pass Multi-Sketch Data Analytics Accelerator

Authors:
Monica Chiosa (ETH Zurich), Thomas B Preußer (Accemic Technologies), Gustavo Alonso (ETHZ)

Abstract

Data analysts often need to characterize a data stream as a basic first step to its further processing. Some of the initial insights to be gained include, e.g., the cardinality of the data set and its frequency distribution. Such information is typically extracted by using sketch algorithms, now widely employed to process very large data sets in manageable space and in a single pass over the data. Often, analysts need more than one parameter characterizing the stream. However, computing multiple sketches becomes expensive even when using high-end CPUs. Exploiting the growing specialization of the underlying compute infrastructure by hardware accelerators, this paper proposes SKT, an FPGA-based bump-in-the-wire accelerator that can compute several sketches along with basic statistics (average, max, min, etc.) in a single pass over the data. SKT has been designed to characterize a data set by calculating its cardinality, its second frequency moment, and its frequency distribution. The design processes streams at TCP/IP line rates of 100 Gbps and is built to fit emerging cloud service architectures, such as Microsoft’s Catapult or Amazon’s AQUA. The paper explores the trade-offs of designing sketch algorithms on a spatial architecture and how to combine several sketch algorithms into a single design. It demonstrates by extensive experimentation how the FPGA-accelerated SKT implementation achieves a significant performance gain over high-end, server-class CPUs.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy