Accelerating Python UDFs in Vectorized Query Execution
Abstract
Modern analytical database systems offer support for user-defined funtions as a flexible extension to SQL. Python is one of the most popular UDF languages being easy to use and offering a rich feature set for data-intensive tasks, but also suffering from bad performance and scalability. In this work, we describe approaches to accelerate embedded Python UDF execution using vectorization, parallelisation and compilation. We compare different compilation frameworks and show how Python code can be compiled, dynamically loaded and queried during database runtime in a transparent way. Our evaluation showed that using compilation and parallelisation together leads to significant speedups for various use cases.