Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies.

Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton, Karthikeyan Ramasamy: Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies. VLDB 1996: 522-531

@inproceedings{DBLP:conf/vldb/ShuklaDNR96,
  author    = {Amit Shukla and
               Prasad Deshpande and
               Jeffrey F. Naughton and
               Karthikeyan Ramasamy},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Storage Estimation for Multidimensional Aggregates in the Presence
               of Hierarchies},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {522-531},
  ee        = {db/conf/vldb/ShuklaDNR96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

To speed up multidimensional data analysis, database systems frequently precompute aggregates on some subsets of dimensions and their corresponding hierarchies. This improves query response time. However, the decision of what and how much to precompute is a difficult one. It is further complicated by the fact that precomputation in the presence of hierarchies can result in an unintuitively large increase in the amount of storage required by the database. Hence, it is interesting and useful to estimate the storage blowup that will result from a proposed set of precomputations without actually computing them. We propose three strategies for this problem: one based on sampling, one mathematical approximation, and one based on probabilistic counting. We investigate the accuracy of these algorithms in estimating the blowup for different data distributions and database schemas. The algorithm based upon probabilistic counting is particularly attractive, since it estimates the storage blowup to within provable error bounds while performing only a single scan of the data.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

Download PDF file (www.vldb.org, Darmstadt, Germany)
Download PDF file (www.acm.org, New York, USA)

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the CD and click on the path of your mount point:
/Anthology/vldb8997 or /cdrom

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents

Electronic Edition

From CS Dept., University Trier (Germany)

References

[AAD+96]: Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi: On the Computation of Multidimensional Aggregates. VLDB 1996: 506-521
[AGS95]: ...
[CCS93]: ...
[Fel57]: ...
[FM85]: Philippe Flajolet, G. Nigel Martin: Probabilistic Counting Algorithms for Data Base Applications. J. Comput. Syst. Sci. 31(2): 182-209(1985)
[GBLP96]: Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996: 152-159
[HNSS93]: ...
[HNSS95]: Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes: Sampling-Based Estimation of the Number of Distinct Values of an Attribute. VLDB 1995: 311-322
[HRU96]: Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman: Implementing Data Cubes Efficiently. SIGMOD Conference 1996: 205-216
[KT95]: ...
[Str95]: ...
[Zip49]: George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949