Time Series Data Encoding for Efficient Storage: A Comparative Analysis in Apache IoTDB

Authors:

Jinzhao Xiao (Tsinghua University) Yuxiang Huang (Tsinghua University) Changyu Hu (Tsinghua University) Shaoxu Song (Tsinghua University)* Huang Xiangdong (Tsinghua University) Jianmin Wang ("Tsinghua University, China")

Download PDF

Abstract

Not only the vast applications but also the distinct features of time series data stimulate the booming growth of time series database management systems, such as Apache IoTDB, InfluxDB, OpenTSDB and so on. Almost all these systems employ columnar storage, with effective encoding of time series data. Given the distinct features of various time series data, it is not surprising that different en- coding strategies may perform variously. In this study, we first summarize the features of time series data that may affect encod- ing performance, including scale, delta, repeat and increase. Then, we introduce the storage scheme of a typical time series database, Apache IoTDB, prescribing the limits to implementing encoding algorithms in the system. A qualitative analysis of encoding effec- tiveness regarding to various data features is then presented for the studied algorithms. To this end, we develop a benchmark for eval- uating encoding algorithms, including a data generator regarding the aforesaid data features and several real-world datasets from our industrial partners. Finally, we present an extensive experimental evaluation using the benchmark. Remarkably, a quantitative anal- ysis of encoding effectiveness regarding to various data features is conducted in Apache IoTDB.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 15, No. 10

Time Series Data Encoding for Efficient Storage: A Comparative Analysis in Apache IoTDB

Abstract