This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

Accelerating Python UDFs in Vectorized Query Execution

Authors:
Steffen Kläbe, Bobby DeSantis, Stefan Hagedorn, Kai-Uwe Sattler
Abstract

Modern analytical database systems offer support for user-defined funtions as a flexible extension to SQL. Python is one of the most popular UDF languages being easy to use and offering a rich feature set for data-intensive tasks, but also suffering from bad performance and scalability. In this work, we describe approaches to accelerate embedded Python UDF execution using vectorization, parallelisation and compilation. We compare different compilation frameworks and show how Python code can be compiled, dynamically loaded and queried during database runtime in a transparent way. Our evaluation showed that using compilation and parallelisation together leads to significant speedups for various use cases.