go back

Volume 17, No. 8

Rapidash: Efficient Detection of Constraint Violations

Authors:
Zifan Liu, Shaleen Deep, Anna Fariha, Fotis Psallidas, Ashish Tiwari, Avrilia Floratou

Abstract

Denial Constraint (DC) is a well-established formalism that captures a wide range of integrity constraints commonly encountered, including candidate keys, functional dependencies, and ordering constraints, among others. Given their significance, there has been considerable research interest in achieving fast detection of DC violations, especially to support activities related to data exploration and preparation. Despite the significant advancements in the field, prior work exhibits notable limitations when confronted with large-scale datasets: the current state-of-the-art algorithm demonstrates a quadratic (worst-case) time and space complexity relative to the dataset’s number of rows. In this paper, we establish a connection between orthogonal range search and DC violation detection. We then introduce Rapidash, a novel algorithm that demonstrates near-linear time and space complexity, representing a theoretical improvement over prior work. To validate the effectiveness of our algorithm, we conduct comprehensive evaluations on both open-source and real-world production datasets, with our production datasets notably being an order of magnitude larger than the datasets employed in prior studies. Our results reveal that Rapidash achieves up to 84× faster performance compared to state-of-the-art approaches while also exhibiting superior scalability.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy