go back
go back
Volume 17, No. 11
Distance-based Outlier Query Optimization in Apache IoTDB
Abstract
While outlier detection has been widely studied over streaming data, the query of outliers in time series databases was largely overlooked. Apache IoTDB, an open-source time series database, employs LSM-tree based storage to support intensive writing workloads, yet this storage structure unfortunately encumbers the outlier query performing. In the system, data points of a time series may be stored in multiple files with overlapping time ranges, owing to the far delayed data arrivals, which are simply discarded in streaming outlier detection. Given the overlapping time ranges, it is not able to detect outliers in each file and merge them as the results. In this paper, we focus on optimizing the efficiency of distance-based outlier query in Apache IoTDB, with the consideration of overlapping files for delayed data. We propose to utilize bucket statistics of the values stored in files. Upper and lower bounds on the neighbor counts of data points are derived in buckets and overlapping files for efficient pruning. Extensive experiments demonstrate the efficiency of our proposal in the LSM-tree based time series database, Apache IoTDB, compared to the existing outlier detection methods designed for data streams.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy