go back

Volume 17, No. 13

MTSClean: Efficient Constraint-based Cleaning for Multi-Dimensional Time Series Data

Authors:
Xiaoou Ding, Song Yichen, Hongzhi Wang, Chen Wang, Donghua Yang

Abstract

The widespread existence of time series data in information systems poses significant challenges to data cleaning due to its quality issues, particularly the complex interdependencies among attributes and the persistence of errors. Existing semantic constraints, such as conditional regression rules and speed constraints, though helpful, remain insufficient for this task. This paper introduces two novel online cleaning methods: MTSClean and MTSClean-soft, designed to improve cleaning efficiency and robustness. By combining row and column constraints, we significantly accelerate the cleaning process, reducing the time complexity of the exact solution MTSClean from 𝑂(︁(𝑁𝑀)3.5|Σ|)︁ to 𝑂(︁𝑁𝑀3.5|Σ|)︁. Meanwhile, MTSClean-softachieves𝑂(︁𝑁𝑀2)︁andmorepreciserepairsthrough optimized search for key cells and a novel repair cost function. Comparative experiments against nine benchmark methods highlight our approach’s superiority in multiple metrics, completing cleaning tasks faster and performing better than state-of-the-art methods. This demonstrates the practicality and advantage of the proposed methods in cleaning multidimensional time series data.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy