Random Sampling for Histogram Construction: How much is enough?
Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya:
Random Sampling for Histogram Construction: How much is enough?
SIGMOD Conference 1998: 436-447@inproceedings{DBLP:conf/sigmod/ChaudhuriMN98,
author = {Surajit Chaudhuri and
Rajeev Motwani and
Vivek R. Narasayya},
editor = {Laura M. Haas and
Ashutosh Tiwary},
title = {Random Sampling for Histogram Construction: How much is enough?},
booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
publisher = {ACM Press},
year = {1998},
isbn = {0-89791-995-5},
pages = {436-447},
ee = {http://doi.acm.org/10.1145/276304.276343, db/conf/sigmod/ChaudhuriMN98.html},
crossref = {DBLP:conf/sigmod/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Random sampling is a standard technique for constructing (approximate)
histograms for query optimization. However, any real implementation in
commercial products requires solving the hard problem of determining
"How much sampling is enough?" We address this critical question
in the context of equi-height histograms used in many commercial products,
including Microsoft SQL Server. We introduce a conservative error
metric capturing the intuition that for an approximate histogram to have
low error, the error must be small in all regions of the histogram.
We then present a result establishing an optimal bound on the amount of
sampling required for pre-specified error bounds.
We also describe an adaptive page sampling algorithm
which achieves greater efficiency by using all values in a sampled page
but adjusts the amount of sampling depending
on clustering of values in pages. Next, we establish that the problem of
estimating the number of distinct values is provably difficult ,
but propose a new error metric which has a reliable estimator and can
still be exploited by query optimizers to influence the choice of
execution plans.
The algorithm for histogram construction was prototyped on Microsoft SQL
Server 7.0 and we present experimental results showing
that the adaptive algorithm accurately approximates the true histogram
over different data distributions.
Copyright © 1998 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...
Online Version (ACM WWW Account required): Full Text in PDF Format
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Laura M. Haas, Ashutosh Tiwary (Eds.):
SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA.
ACM Press 1998, ISBN 0-89791-995-5 ,
SIGMOD Record 27(2),
June 1998
Contents
[Abstract]
References
- [1]
- ...
- [2]
- ...
- [3]
- ...
- [4]
- ...
- [5]
- ...
- [6]
- Surajit Chaudhuri, Vivek R. Narasayya:
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server.
VLDB 1997: 146-155
- [7]
- Sheldon J. Finkelstein, Mario Schkolnick, Paolo Tiberio:
Physical Database Design for Relational Databases.
ACM Trans. Database Syst. 13(1): 91-128(1988)
- [8]
- Phillip B. Gibbons, Yossi Matias, Viswanath Poosala:
Fast Incremental Maintenance of Approximate Histograms.
VLDB 1997: 466-475
- [9]
- ...
- [10]
- Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes:
Sampling-Based Estimation of the Number of Distinct Values of an Attribute.
VLDB 1995: 311-322
- [11]
- Peter J. Haas, Arun N. Swami:
Sequential Sampling Procedures for Query Size Estimation.
SIGMOD Conference 1992: 341-350
- [12]
- Wen-Chi Hou, Gultekin Özsoyoglu, Erdogan Dogdu:
Error-Constraint COUNT Query Evaluation in Relational Databases.
SIGMOD Conference 1991: 278-287
- [13]
- Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja:
Statistical Estimators for Relational Algebra Expressions.
PODS 1988: 276-287
- [14]
- Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja:
Processing Aggregate Relational Queries with Hard Time Constraints.
SIGMOD Conference 1989: 68-77
- [15]
- Yannis E. Ioannidis, Viswanath Poosala:
Balancing Histogram Optimality and Practicality for Query Result Size Estimation.
SIGMOD Conference 1995: 233-244
- [16]
- Yannis E. Ioannidis, Viswanath Poosala:
Histogram-Based Solutions to Diverse Database Estimation Problems.
IEEE Data Eng. Bull. 18(3): 10-18(1995)
- [17]
- Yibei Ling, Wei Sun:
An Evaluation of Sampling-Based Size Estimation Methods for Selections in Database Systems.
ICDE 1995: 532-539
- [18]
- Richard J. Lipton, Jeffrey F. Naughton:
Query Size Estimation by Adaptive Sampling.
PODS 1990: 40-46
- [19]
- Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider:
Practical Selectivity Estimation through Adaptive Sampling.
SIGMOD Conference 1990: 1-11
- [20]
- Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri:
Efficient Sampling Strategies for Relational Database Operations.
Theor. Comput. Sci. 116(1&2): 195-226(1993)
- [21]
- Rajeev Motwani, Prabhakar Raghavan:
Randomized Algorithms.
Cambridge University Press 1995, ISBN 0-521-47465-5
- [22]
- Jeffrey F. Naughton, S. Seshadri:
On Estimating the Size of Projections.
ICDT 1990: 499-513
- [23]
- ...
- [24]
- ...
- [25]
- Gultekin Özsoyoglu, Kaizheng Du, A. Tjahjana, Wen-Chi Hou, D. Y. Rowland:
On Estimating COUNT, SUM, and AVERAGE.
DEXA 1991: 406-412
- [26]
- Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita:
Improved Histograms for Selectivity Estimation of Range Predicates.
SIGMOD Conference 1996: 294-305
- [27]
- Gregory Piatetsky-Shapiro, Charles Connell:
Accurate Estimation of the Number of Tuples Satisfying a Condition.
SIGMOD Conference 1984: 256-276
- [28]
- Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price:
Access Path Selection in a Relational Database Management System.
SIGMOD Conference 1979: 23-34
- [29]
- George Kingsley Zipf:
Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology.
Addison-Wesley 1949
Copyright © Fri Mar 12 17:21:34 2010
by Michael Ley (ley@uni-trier.de)