Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation.
Arnd Christian König, Gerhard Weikum:
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation.
VLDB 1999: 423-434@inproceedings{DBLP:conf/vldb/KonigW99,
author = {Arnd Christian K{\"o}nig and
Gerhard Weikum},
editor = {Malcolm P. Atkinson and
Maria E. Orlowska and
Patrick Valduriez and
Stanley B. Zdonik and
Michael L. Brodie},
title = {Combining Histograms and Parametric Curve Fitting for Feedback-Driven
Query Result-size Estimation},
booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
UK},
publisher = {Morgan Kaufmann},
year = {1999},
isbn = {1-55860-615-7},
pages = {423-434},
ee = {db/conf/vldb/KonigW99.html},
crossref = {DBLP:conf/vldb/99},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
This paper aims to improve the accuracy of query result-size estimations in query
optimizers by leveraging the dynamic feedback obtained from observations on the
executed query workload. To this end, an approximate "synopsis" of data-value
distributions is devised that combines histograms with parametric curve fitting,
leading to a specific class of linear splines. The approach reconciles the benefits
of histograms, simplicity and versatility, with those of parametric techniques
especially the adaptivity to statistically biased and dynamically evolving query
workloads.
The paper presents efficient algorithms for constructing the linear-spline synopsis for data-value distributions from a moving window of the most recent observations on (the most critical) query executions. The approach is worked out in full detail for capturing frequency as well as density distributions of data values, and it is shown how result size estimations are inferred for exact-match and range queries as well as projections and grouping. To a large extent, the developed methods can be generalized to multi-dimensional distributions, thus bearing the ability to capture correlations among attributes as well. Experimental studies underline the accuracy of the developed estimation methods, outperforming the best known classes of histograms.
Copyright © 1999 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.):
VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK.
Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents
References
- [1]
- Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik:
The New Jersey Data Reduction Report.
IEEE Data Eng. Bull. 20(4): 3-45(1997)
- [2]
- Surajit Chaudhuri:
An Overview of Query Optimization in Relational Systems.
PODS 1998: 34-43
- [3]
- Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya:
Random Sampling for Histogram Construction: How much is enough?
SIGMOD Conference 1998: 436-447
- [4]
- Chung-Min Chen, Nick Roussopoulos:
Adaptive Selectivity Estimation Using Query Feedback.
SIGMOD Conference 1994: 161-172
- [5]
- ...
- [6]
- ...
- [7]
- Sumit Ganguly, Phillip B. Gibbons, Yossi Matias, Abraham Silberschatz:
Bifocal Sampling for Skew-Resistant Join Size Estimation.
SIGMOD Conference 1996: 271-281
- [8]
- Phillip B. Gibbons, Yossi Matias:
New Sampling-Based Summary Statistics for Improving Approximate Query Answers.
SIGMOD Conference 1998: 331-342
- [9]
- ...
- [10]
- Phillip B. Gibbons, Yossi Matias, Viswanath Poosala:
Fast Incremental Maintenance of Approximate Histograms.
VLDB 1997: 466-475
- [11]
- ...
- [12]
- Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Arun N. Swami:
Selectivity and Cost Estimation for Joins Based on Random Sampling.
J. Comput. Syst. Sci. 52(3): 550-569(1996)
- [13]
- Yannis E. Ioannidis:
Universality of Serial Histograms.
VLDB 1993: 256-267
- [14]
- Yannis E. Ioannidis, Stavros Christodoulakis:
On the Propagation of Errors in the Size of Join Results.
SIGMOD Conference 1991: 268-277
- [15]
- Yannis E. Ioannidis, Stavros Christodoulakis:
Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results.
ACM Trans. Database Syst. 18(4): 709-748(1993)
- [16]
- Yannis E. Ioannidis, Viswanath Poosala:
Balancing Histogram Optimality and Practicality for Query Result Size Estimation.
SIGMOD Conference 1995: 233-244
- [17]
- H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel:
Optimal Histograms with Quality Guarantees.
VLDB 1998: 275-286
- [18]
- Navin Kabra, David J. DeWitt:
Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans.
SIGMOD Conference 1998: 106-117
- [19]
- Yibei Ling, Wei Sun:
An Evaluation of Sampling-Based Size Estimation Methods for Selections in Database Systems.
ICDE 1995: 532-539
- [20]
- Michael V. Mannino, Paicheng Chu, Thomas Sager:
Statistical Profile Estimation in Database Systems.
ACM Comput. Surv. 20(3): 191-221(1988)
- [21]
- Yossi Matias, Jeffrey Scott Vitter, Min Wang:
Wavelet-Based Histograms for Selectivity Estimation.
SIGMOD Conference 1998: 448-459
- [22]
- ...
- [23]
- ...
- [24]
- Viswanath Poosala:
Histogram-Based Estimation Techniques in Database Systems.
Ph.D. thesis, Univ. of Wisconsin-Madison 1997
- [25]
- Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita:
Improved Histograms for Selectivity Estimation of Range Predicates.
SIGMOD Conference 1996: 294-305
- [26]
- ...
- [27]
- ...
- [28]
- Wei Sun, Yibei Ling, Naphtali Rishe, Yi Deng:
An Instant and Accurate Estimation Method for Joins and Selection in a Retrieval-Intensive Environment.
SIGMOD Conference 1993: 79-88
- [29]
- ...
Copyright © Fri Mar 12 17:22:57 2010
by Michael Ley (ley@uni-trier.de)