Hercules Against Data Series Similarity Search

Authors:

Karima Echihabi (Mohammed VI Polytechnic University)* Panagiota Fatourou ( University of Crete) Kostas Zoumpatianos (Snowflake Computing) Themis Palpanas (University of Paris) Houda Benbrahim (ENSIAS, Université Mohammed V de Rabat)

Download PDF

Abstract

In this paper, we propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. We present novel index construction and query answering algorithms that leverage different summarization techniques, carefully schedule costly operations, optimize memory and disk accesses, and exploit the multi-threading and SIMD capabilities of modern hardware to perform CPU-intensive calculations. We demonstrate the superiority and robustness of Hercules with an extensive experimental evaluation against the state-of-the-art techniques, using a variety of synthetic and real datasets, and query workloads of varying difficulty. The results show that Hercules performs up to one order of magnitude faster than the best competitor (which is not always the same). Moreover, Hercules is the only index that outperforms the optimized scan on all scenarios, including the hard query workloads on disk-based datasets.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 15, No. 10

Hercules Against Data Series Similarity Search

Abstract