go back

Volume 16, No. 12

To UDFs and Beyond: Demonstration of a Fully Decomposed Data Processor for General Data Wrangling Tasks

Authors:
Nico Schäfer, Damjan Gjurovski, Angjela Davitkova, Sebastian Michel

Abstract

While existing data management solutions try to keep up with novel data formats and features, a myriad of valuable functionality is often only accessible via programming language libraries. Particularly for machine learning tasks, there is a wealth of pre-trained models and easy-to-use libraries that allow a wide audience to harness state-of-the-art machine learning. We propose the demonstration of a highly modularized data processor for semi-structured data that can be extended by means of plain Python scripts. Next to commonly supported user-defined functions, the deep decomposition allows augmenting the core engine with additional index structures, customized import and export routines, and custom aggregation functions. For several use cases, we detail how user-defined modules can be quickly realized and invite the audience to write and apply custom code, to tailor provided code snippets that we bring along to own preferences to solve data analytics tasks involving sentiment analysis of Twitter tweets.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy