WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Model

Authors:

Gyeong-In Yu (Seoul National University)* Saeed Amizadeh (Microsoft) Sehoon Kim (University of California, Berkeley) Artidoro Pagnoni (Carnegie Mellon University) Ce Zhang (ETH) Byung-Gon Chun (Seoul National University) Markus Weimer (Microsoft) Matteo Interlandi (Microsoft)

Download PDF

Abstract

While deep neural networks (DNNs) have shown to be successful in several domains like computer vision, non-DNN models such as linear models and gradient boosting trees are still considered state-of-the-art over tabular data.When using these models, data scientists often author machine learning (ML) pipelines: DAG of ML operators comprising data transforms and ML models, whereby each operator is sequentially trained one-at-a-time.Conversely, when training DNNs, layers composing the neural networks are simultaneously trained using backpropagation.In this paper, we argue that the training scheme of ML pipelines is sub-optimal because it tries to optimize a single operator at a time thus losing the chance of global optimization.We therefore propose WindTunnel: a system that translates a trained ML pipeline into a pipeline of neural network modules and jointly optimizes the modules using backpropagation.We also suggest translation methodologies for several non-differentiable operators such as gradient boosting trees and categorical feature encoders.Our experiments show that fine-tuning of the translated WindTunnel pipelines is a promising technique able to increase the final accuracy.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 15, No. 1

WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Model

Abstract