@inproceedings{DBLP:conf/vldb/KoudasMJ99, author = {H. V. Jagadish and Nick Koudas and S. Muthukrishnan}, editor = {Malcolm P. Atkinson and Maria E. Orlowska and Patrick Valduriez and Stanley B. Zdonik and Michael L. Brodie}, title = {Mining Deviants in a Time Series Database}, booktitle = {VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK}, publisher = {Morgan Kaufmann}, year = {1999}, isbn = {1-55860-615-7}, pages = {102-113}, ee = {db/conf/vldb/KoudasMJ99.html}, crossref = {DBLP:conf/vldb/99}, bibsource = {DBLP, http://dblp.uni-trier.de} }
Identifiying outliers is an important data analysis function. Statisticans have long studied techniques to identify outliers is a data set in the context of fitting the data to some model. In the case of time series data, the situation is more murky. For instance, the ``typical'' value cound ``drift'' up or down over time, so the extrema may not necessarily be interesting. We wish to identify data points that are somehow anomalous or ``surprising''.
We formally define the notion of a deviant in a time series, based on a representation sparsity metric. We develop an efficient algorithm to identify devinats is a time series. We demonstrate how this technique can be used to locate interesting artifacts in time series data, and present experimental evidence of the value of our technique.
As a side benefit, our algorithm are able to produce histogram representations of data, that have substantially lower error than ``optimal histograms'' for the same total storage, including both histogram buckets and the deviants stored separately. This is of independent interest for selectivity estimation.
Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.