go back
go back
Volume 15, No. 5
User-Defined Operators: Efficiently Integrating Custom Algorithms into Modern Databases
Abstract
Complex data mining and machine learning algorithms have become more common in data analytics over the last years. Several specialized systems exist that can evaluate those algorithms on ever-growing data sets. They are built to execute different kinds of complex analytics queries efficiently. Using various systems comes at a price, however. Moving data out of traditional database systems is often slow as it requires exporting and importing data, often in the relatively inefficient CSV format. Also, database systems usually offer strong ACID guarantees, which are lost when adding new, external systems. This can be detrimental to the consistency of the results. Most data scientists still prefer not to use classical database systems for their analytics. The main reason why RDBMS are not used is that SQL is hard to work with due to its declarative and set-oriented nature and is not easily extensible. We present User-Defined Operators as a concept to include custom algorithms into modern query engines to improve this. Users can write idiomatic code in the programming language of their choice, which is then directly integrated into existing databases. We show that our implementation can compete with specialized tools and existing query engines while retaining all beneficial properties of the database system.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy