Quickly Generating Billion-Record Synthetic Databases.
Jim Gray, Prakash Sundaresan, Susanne Englert, Kenneth Baclawski, Peter J. Weinberger:
Quickly Generating Billion-Record Synthetic Databases.
SIGMOD Conference 1994: 243-252@inproceedings{DBLP:conf/sigmod/GraySEBW94,
author = {Jim Gray and
Prakash Sundaresan and
Susanne Englert and
Kenneth Baclawski and
Peter J. Weinberger},
editor = {Richard T. Snodgrass and
Marianne Winslett},
title = {Quickly Generating Billion-Record Synthetic Databases},
booktitle = {Proceedings of the 1994 ACM SIGMOD International Conference on
Management of Data, Minneapolis, Minnesota, May 24-27, 1994},
publisher = {ACM Press},
year = {1994},
pages = {243-252},
ee = {http://doi.acm.org/10.1145/191839.191886, db/conf/sigmod/GraySEBW94.html},
crossref = {DBLP:conf/sigmod/94},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
Evaluating database system performance often requires generating synthetic
databases -
ones having certain statistical properties but filled with dummy information.
When
evaluating different database designs, it is often necessary to generate
several databases
and evaluate each design. As database sizes grow to terabytes, generation
often takes
longer than evaluation. This paper presents several database generation
techniques. In particular it discusses:
(1) Parallelism to get generation speedup and scaleup.
(2) Congruential generators to get dense unique uniform distributions.
(3) Special-case discrete logarithms to generate indices concurrent to the
base table generation.
(4) Modification of (2) to get exponential, normal, and self-similar
distributions.
The discussion is in terms of generating billion-record SQL databases using
C programs
running on a shared-nothing computer system consisting of a hundred processors,with a
thousand discs. The ideas apply to smaller databases, but large databases
present the more difficult problems.
Copyright © 1994 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
Online Version (ACM WWW Account required): Full Text in PDF Format
CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Richard T. Snodgrass, Marianne Winslett (Eds.):
Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24-27, 1994.
ACM Press 1994 ,
SIGMOD Record 23(2),
June 1994
Contents
[Abstract and Index Terms]
[Full Text in PDF Format, 1086 KB]
References
- [Bitton 1]
- ...
- [Bitton 2]
- Dina Bitton, David J. DeWitt, Carolyn Turbyfill:
Benchmarking Database Systems A Systematic Approach.
VLDB 1983: 8-19
- [Coppersmith]
- Don Coppersmith, Andrew M. Odlyzko, Richard Schroeppel:
Discrete Logarithms in GF(p).
Algorithmica 1(1): 1-15(1986)
- [DeWitt 1]
- David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna:
GAMMA - A High Performance Dataflow Database Machine.
VLDB 1986: 228-237
- [DeWitt 2]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen:
The Gamma Database Machine Project.
IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990)
- [DeWitt 3]
- David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider:
Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting.
PDIS 1991: 280-291
- [Englert]
- Susanne Englert, Jim Gray, Terrye Kocher, Praful Shah:
A Benchmark of NonStop SQL Release 2 Demonstrating Near-Linear Speedup and Scaleup on Large Databases.
SIGMETRICS 1990: 245-246
- [Gerber]
- ...
- [Hobbs]
- ...
- [Horst]
- ...
- [Jain]
- ...
- [Kim]
- Michelle Y. Kim:
Synchronized Disk Interleaving.
IEEE Trans. Computers 35(11): 978-988(1986)
- [Knuth]
- Donald E. Knuth:
The Art of Computer Programming, Volume II: Seminumerical Algorithms, 2nd Edition.
Addison-Wesley 1981, ISBN 0-201-03822-6
- [Kronenberg]
- ...
- [Nyberg]
- Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, David B. Lomet:
AlphaSort: A RISC Machine Sort.
SIGMOD Conference 1994: 233-242
- [Press]
- William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery:
Numerical Recipes in C, 2nd Edition.
Cambridge University Press 1992
Contents - [Ripley]
- ...
- [Schrage]
- ...
- [Smith]
- Marc G. Smith, William Alexander, Haran Boral, George P. Copeland, Tom W. Keller, Herbert D. Schwetman, Chii-Ren Young:
An Experiment on Response Time Scalability in Bubba.
IWDM 1989: 34-57
- [Stonebraker]
- Michael Stonebraker:
The Case for Shared Nothing.
IEEE Database Eng. Bull. 9(1): 4-9(1986)
- [Tanenbaum]
- ...
- [Teradata]
- ...
- [Thekkath]
- Chandramohan A. Thekkath, Henry M. Levy:
Limits to Low-Latency Communication on High-Speed Networks.
ACM Trans. Comput. Syst. 11(2): 179-203(1993)
- [TPC]
- ...
- [Uren]
- ...
Copyright © Fri Mar 12 17:21:31 2010
by Michael Ley (ley@uni-trier.de)