go back
go back
Volume 15, No. 5
Projection-Compliant Database Generation
Abstract
Synthetic databases that exhibit a desired set of characteristics are required in a variety of use-cases, ranging from testing and tuning of database engines and applications to system benchmarking. Expressing these characteristics through declarative formalisms has been advocated in contemporary generation frameworks. In particular, specifying operator output volumes through row-cardinality constraints has received considerable attention. However, thus far, adherence to these volumetric constraints has been limited to the Filter and Join operators. A critical deficiency is the lack of support for the Projection operator, which forms the core of basic SQL constructs such as Distinct, Union and Group By. The technical challenge here is that cardinality unions in multi-dimensional space, and not mere summations, need to be captured in the generation process. Further, dependencies across different data subspaces need to be taken into account. In this paper, we address the above lacuna by presenting PiGen, a dynamic data generator that incorporates Projection cardinality constraints in its ambit. The design is based on a projection subspace division strategy which supports the expression of constraints using optimized linear programming formulations. Further, techniques of symmetric refinement and workload decomposition are introduced to handle constraints across different projection subspaces. Finally, PiGen supports dynamic generation, where data is generated on-demand during query processing, making it amenable to Big Data environments. A detailed evaluation on workloads derived from real-world and synthetic benchmarks demonstrates that PiGen can accurately and efficiently model Projection outcomes, representing an essential step forward in customized database generation.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy