Towards A Polyglot Framework for Factorized ML

Authors:

David A Justo (UC San Diego), Shaoqing Yi (UC San Diego), Lukas Stadler (Oracle Labs), Nadia Polikarpova (University of California, San Diego), Arun Kumar (University of California, San Diego)

Download PDF

Abstract

Optimizing machine learning (ML) workloads on structured data is a key concern for data platforms. One class of optimizations called “factorized ML” helps reduce ML runtime sover multi-table datasets by pushing ML computations down through joins, avoiding the need to materialize such joins.The recent Morpheus system automated factorized ML to any ML algorithm expressible in linear algebra (LA). But all such prior factorized ML/LA stacks are restricted by their chosen programming language (PL) and runtime environment,limiting their reach in emerging industrial data science environments with many PLs (R, Python, etc.) and even cross-PL analytics workflows. Re-implementing Morpheus from scratch in each PL/environment is a massive developability overhead for implementation, testing, and maintenance. We tackle this challenge by proposing a new system architecture,Trinity, to enable factorized LA logic to be written only once and easily reused across many PLs/LA tools in one go. To do this in an extensible and efficient manner without costly data copies,Trinity leverages and extends an emerging industrial polyglot compiler and runtime, Oracle’s GraalVM. Trinity enables factorized LA in multiple PLs and even cross-PL workflows.Experiments with real datasets show that Trinity is significantly faster than materialized execution (>8x speedups in some cases), while being largely competitive to a prior single PL-focused Morpheus stack.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 14, No. 12

Towards A Polyglot Framework for Factorized ML

Abstract