go back

Volume 14, No. 13

PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost!

Authors:
H M Sajjad Hossain (Microsoft), Marc T Friedman (Microsoft), Hiren Patel (Microsoft), Shi Qiao (Microsoft), Soundar Srinivasan (Microsoft), Markus Weimer (Microsoft), Remmelt Ammerlaan (Microsoft), Lucas Rosenblatt (NYU), Gilbert Antonius (Microsoft), Peter Orenberg (Microsoft), Vijay Ramani (Microsoft), Abhishek Roy (Microsoft), Irene Shaffer (Microsoft), Alekh Jindal (Microsoft)

Abstract

Modern cloud workloads require tuning and optimization at massive scales, and automated optimizations using machine learning models (ML-for-Systems) have shown promising results. The machine learning models, however, are subject to over generalizations that do not capture the large variety of workload patterns, and tend to augment the performance of certain subsets in the workload while regressing performance for others. In this paper, we introduce a performance safeguard system (PerfGuard) that assists in designing pre-production experiments to inform model deployment. Our experimentation pipeline circumvents searching the entire query plan space (a well-known, intractable problem), and instead focuses on plan structure deltas (a significantly smaller space). Our ML approach formalizes these differences, and correlates plan deltas to important feedback signals, like execution cost. We share our end-to-end pipeline structure and deep learning architecture as a prototype system for use with general relational databases. We demonstrate that this architecture improves on baseline models, and that our pipeline identifies key query plan components as major contributors to plan disparity. In offline experimentation, focusing on plan changes shows validity as a promising approach, with many opportunities for future improvement.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy