go back

Volume 15, No. 1

WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Model

Authors:
Gyeong-In Yu (Seoul National University)* Saeed Amizadeh (Microsoft) Sehoon Kim (University of California, Berkeley) Artidoro Pagnoni (Carnegie Mellon University) Ce Zhang (ETH) Byung-Gon Chun (Seoul National University) Markus Weimer (Microsoft) Matteo Interlandi (Microsoft)

Abstract

While deep neural networks (DNNs) have shown to be successful in several domains like computer vision, non-DNN models such as linear models and gradient boosting trees are still considered state-of-the-art over tabular data.When using these models, data scientists often author machine learning (ML) pipelines: DAG of ML operators comprising data transforms and ML models, whereby each operator is sequentially trained one-at-a-time.Conversely, when training DNNs, layers composing the neural networks are simultaneously trained using backpropagation.In this paper, we argue that the training scheme of ML pipelines is sub-optimal because it tries to optimize a single operator at a time thus losing the chance of global optimization.We therefore propose WindTunnel: a system that translates a trained ML pipeline into a pipeline of neural network modules and jointly optimizes the modules using backpropagation.We also suggest translation methodologies for several non-differentiable operators such as gradient boosting trees and categorical feature encoders.Our experiments show that fine-tuning of the translated WindTunnel pipelines is a promising technique able to increase the final accuracy.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy