go back

Volume 15, No. 6

Multivariate correlations discovery in static and streaming data

Authors:
Koen Minartz (Eindhoven University of Technology) Jens d'Hondt (TU Eindhoven) Odysseas Papapetrou (TU Eindhoven)*

Abstract

Correlation analysis is an invaluable tool in many domains, for better understanding the data and extracting salient insights. Most works to date focus on detecting high pairwise correlations -- pairs of variables/vectors that have a high Pearson correlation coefficient. A generalization of this problem with known applications but no known efficient solutions involves discovery of significant multivariate correlations, i.e., select vectors (typically in the order of 3 to 5 vectors) that exhibit a strong dependence when considered altogether. In this work we propose efficient one-shot and streaming algorithms for detection of multivariate correlations. Our algorithms rely on novel theoretical results, work with two different correlation measures, and support addition of user constraints. Our extensive experimental evaluation with 5 datasets examines the properties of our solution and demonstrates that our algorithms outperform the state-of-the-art, typically by an order of magnitude.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy