go back

Volume 17, No. 12

Clean4TSDB: A Data Cleaning Tool for Time Series Databases

Authors:
Xiaoou Ding, Song Yichen, Hongzhi Wang, Donghua Yang, Chen Wang, Jianmin Wang

Abstract

Billions of data points are generated by devices equipped with thousands of sensors, leading to significant data quality issues in time series data. These errors not only complicate time series data management but also compromise the accuracy and reliability of analysis based on such data. Given the noteworthy characteristics of time series data, existing cleaning methods struggle to provide adequate repairs, and tools supporting expressive constraints for time series remain scarce. To address this, we develop Clean4TSDB, a specialized data cleaning system for time series databases. This system integrates three key modules: expressive data quality constraint discovery, violation detection, and multivariate time series repairing, forming a comprehensive “profiling-detection-repair” workflow. Technically, we introduce TSDD, a data quality constraint that effectively captures contextual relationships within multivariate time series, and implement an efficient algorithm for its automated mining. Leveraging both rowand column-based constraints, we propose an effective time series cleaning algorithm. From a system standpoint, Clean4TSDB is pre-configured for seamless integration with time series databases like Apache IoTDB. Using user-provided and algorithmically-mined constraints, it effectively identifies various error patterns and offers reliable cleaning solutions. Furthermore, we establish a comprehensive library of state-of-the-art time series repair algorithms to meet the diverse needs of different management scenarios.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy