@inproceedings{DBLP:conf/vldb/RossS97, author = {Kenneth A. Ross and Divesh Srivastava}, editor = {Matthias Jarke and Michael J. Carey and Klaus R. Dittrich and Frederick H. Lochovsky and Pericles Loucopoulos and Manfred A. Jeusfeld}, title = {Fast Computation of Sparse Datacubes}, booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece}, publisher = {Morgan Kaufmann}, year = {1997}, isbn = {1-55860-470-7}, pages = {116-125}, ee = {db/conf/vldb/RossS97.html}, crossref = {DBLP:conf/vldb/97}, bibsource = {DBLP, http://dblp.uni-trier.de} }
Datacube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. Real-world data is frequently sparse, and hence efficiently computing datacubes over large sparse relations is important. We show that current techniques for computing datacubes over sparse relations do not scale well with the number of CUBE BY attributes, especially when the relation is much larger than main memory.
We propose a novel algorithm for the fast computation of datacubes over sparse relations, and demonstrate the efficiency of our algorithm using synthetic, benchmark and real-world data sets. When the relation fits in memory, our technique performs multiple in-memory sorts, and does not incur any I/O beyond the input of the relation and the output of the datacube itself. When the relation does not fit in memory, a divide-and-conquer strategy divides the problem of computing the datacube into several simpler computations of sub-datacubes. Often, all but one of the sub-datacubes can be computed in memory and our in-memory solution applies. In that case, the total I/O overhead is linear in the number of CUBE BY attributes. We demonstrate with an implementation that the CPU cost of our algorithm is dominated by the I/O cost for sparse relations.
Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.