go back

Volume 15, No. 7

The Inherent Time Complexity and An Efficient Algorithm for Subsequence Matching Problem

Authors:
Zemin Chao (Harbin institute of technology)* Hong Gao (Harbin Institute of Technology) Yinan An (Harbin institute of technology) Jianzhong Li (Harbin Institute of Technology)

Abstract

Subsequence matching is an important and fundamental problem for analyzing time series data. Many algorithms have been proposed to solve this problem. However, the efficiency of existing algorithms is lower for big time series data and is sensitive to the constraint on mean values and standard deviations in $z$-normalized series. This paper studies the inherent time complexity of the problem and designs a more efficient algorithm for solving the problem. Firstly, it is proved that the subsequence matching problem is incomputable in time $O(n^{1-\delta})$ even allowing polynomial time preprocessing if the hypothesis SETH is true, where $n$ is the size of the input time series and $0 \leq \delta <1$, i.e. , the inherent complexity of the subsequence matching problem is $\omega (n^{1-\delta})$. Secondly, an efficient algorithm for subsequence matching problem is proposed. In order to improve the efficiency of the algorithm, a novel summarization method for time series is proposed, and a novel index to filter out the unpromising candidates is built based on the summarization method. Using the new summarization method and the new index, an efficient algorithm for the subsequence matching problem is designed, which supports both Euclidean Distance and DTW distance with or without \textit{z}-normalization. Experimental results show that the proposed algorithm is up to about 3 $\sim$ 10 times faster than the state of art algorithm KV-match on the constrained \textit{z}-normalized Euclidean Distance and DTW distance, and is up to 7 $\sim$ 12 times faster than KV-match on Euclidean Distance.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy