go back

Volume 14, No. 7

Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees

Authors:
Dimitrije Jankov (Rice University), Binhang Yuan (Rice University), Shangyu Luo (Rice University), Chris Jermaine (Rice University)

Abstract

When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy