A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data.
Khaled Alsabti, Sanjay Ranka, Vineet Singh:
A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data.
VLDB 1997: 346-355@inproceedings{DBLP:conf/vldb/AlsabtiRS97,
author = {Khaled Alsabti and
Sanjay Ranka and
Vineet Singh},
editor = {Matthias Jarke and
Michael J. Carey and
Klaus R. Dittrich and
Frederick H. Lochovsky and
Pericles Loucopoulos and
Manfred A. Jeusfeld},
title = {A One-Pass Algorithm for Accurately Estimating Quantiles for
Disk-Resident Data},
booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
Large Data Bases, August 25-29, 1997, Athens, Greece},
publisher = {Morgan Kaufmann},
year = {1997},
isbn = {1-55860-470-7},
pages = {346-355},
ee = {db/conf/vldb/AlsabtiRS97.html},
crossref = {DBLP:conf/vldb/97},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
The phi-quantile of an ordered sequence of data values is the
element with rank phi*n, where n is the total number of
values. Accurate estimates of quantiles are required for the solution
of many practical problems. In this paper, we present a new algorithm
for estimating the quantile values for disk-resident data. Our
algorithm has the following characteristics: (1) It requires only one
pass over the data; (2) It is deterministic; (3) It produces good
lower and upper bounds of the true values of the quantiles; (4) It
requires no a priori knowledge of the distribution of the data set;
(5) It has a scalable parallel formulation; (6) Extra time and memory
for computing additional quantiles (beyond the first one) are constant
per quantile.
We present experimental results on the IBM SP-2. The experimental
results show that the algorithm is indeed robust and does not depend
on the distribution of the data sets.
Copyright © 1997 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.):
VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece.
Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents
Electronic Edition
From CS Dept.,
University Trier (Germany)
URL
Additional information is available from
http://www.cise.ufl.edu/~ranka
References
- [AIS93]
- Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami:
Mining Association Rules between Sets of Items in Large Databases.
SIGMOD Conference 1993: 207-216
- [ALSS95]
- Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, Kyuseok Shim:
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases.
VLDB 1995: 490-501
- [ARS97]
- ...
- [AS95]
- Rakesh Agrawal, Arun N. Swami:
A One-Pass Space-Efficient Algorithm for Finding Quantiles.
COMAD 1995: 0-
- [AS96]
- Ramakrishnan Srikant, Rakesh Agrawal:
Mining Quantitative Association Rules in Large Relational Tables.
SIGMOD Conference 1996: 1-12
- [B+72]
- Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, Robert Endre Tarjan:
Time Bounds for Selection.
J. Comput. Syst. Sci. 7(4): 448-461(1973)
- [Bat68]
- Kenneth E. Batcher:
Sorting Networks and Their Applications.
AFIPS Spring Joint Computing Conference 1968: 307-314
- [Coc77]
- William G. Cochran:
Sampling Techniques, 3rd Edition.
John Wiley 1977, ISBN 0-471-16240-X
- [D+91]
- David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider:
Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting.
PDIS 1991: 280-291
- [FR75]
- Robert W. Floyd, Ronald L. Rivest:
Expected Time Bounds for Selection.
Commun. ACM 18(3): 165-172(1975)
- [GS90]
- ...
- [JC85]
- Raj Jain, Imrich Chlamtac:
The P² Algorithm for Dynamic Calculation of Quantiiles and Histograms Without Storing Observations.
Commun. ACM 28(10): 1076-1085(1985)
- [KGGK94]
- Vipin Kumar, Ananth Grama, Anshul Gupta, George Karypis:
Introduction to Parallel Computing.
Benjamin/Cummings 1994, ISBN 0-8053-3170-0
- [Koo80]
- Robert Kooi:
The Optimization of Queries in Relational Databases.
Ph.D. thesis, Case Western Reserve University 1980
- [LLS+93]
- Xiaobo Li, Paul Lu, Jonathan Schaeffer, John Shillington, Pok Sze Wong, Hanmao Shi:
On the Versatility of Parallel Sorting by Regular Sampling.
Parallel Computing 19(10): 1079-1103(1993)
- [MD88]
- M. Muralikrishna, David J. DeWitt:
Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries.
SIGMOD Conference 1988: 28-36
- [MP80]
- J. Ian Munro, Mike Paterson:
Selection and Sorting with Limited Storage.
Theor. Comput. Sci. 12: 315-323(1980)
- [PIHS96]
- Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita:
Improved Histograms for Selectivity Estimation of Range Predicates.
SIGMOD Conference 1996: 294-305
- [Ps84]
- Gregory Piatetsky-Shapiro, Charles Connell:
Accurate Estimation of the Number of Tuples Satisfying a Condition.
SIGMOD Conference 1984: 256-276
- [SD77]
- ...
- [Zip49]
- George Kingsley Zipf:
Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology.
Addison-Wesley 1949
Copyright © Tue Mar 16 02:22:06 2010
by Michael Ley (ley@uni-trier.de)