go back
go back
Volume 14, No. 7
Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees
Abstract
When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy