This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

Proceedings of CIDR

Session 1: Query Optimization

Simplicity Done Right for Join Ordering

Axel Hertzschuch, Claudio Hartmann, Dirk Habich, Wolfgang Lehner

Progressive Join Algorithms Considering User Preference

Mengsu Ding, Shimin Chen, Nantia Makrynioti, Stefan Manegold

Accelerating Complex Analytics using Speculation

Panagiotis Sioulas, Viktor Sanca, Ioannis Mytilinis, Anastasia Ailamaki

Session 2: Blockchain and Transactions

chainifyDB: How to get rid of your Blockchain and use your DBMS instead

Felix Schuhknecht

Fraud Buster: Tracking IRSF Using Blockchain While Protecting Business Confidentiality

Shuaicheng Ma, Tamraparni Dasu, Yaron Kanza

Contention and Space Management in B-Trees

Adnan Alhomssi, Viktor Leis

Session 3: Data Analytics

Putting Pandas in a Box

Stefan Hagedorn, Steffen Kläbe, Kai-Uwe Sattler

Magpie: Python at Speed and Scale using Cloud Backends

Alekh Jindal, K Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas Mueller, Wentao Wu, Hiren Patel

Leam: An Interactive System for In-situ Visual Text Analysis

Sajjadur Rahman, Peter Griggs, Çağatay Demiralp

Session 4: New Database Engines

AnyDB: An Architecture-less DBMS for Any Workload

Tiemo Bang, Norman May, Ilia Petrov, Carsten Binnig

VergeDB: A Database for IoT Analytics on Edge Devices

John Paparrizos, Chunwei Liu, Bruno Barbarioli, Johnny Hwang, Ikraduya Edian, Aaron J Elmore, Michael J Franklin, Sanjay Krishnan

Boxer: Data Analytics on Network-enabled Serverless Platforms

Michael Wawrzoniak, Ingo Müller, Rodrigo Fraga Barcelos Paulus Bruno, Gustavo Alonso

Session 5: (Semi)-Supervised Learning

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

Laurel Orr, Megan Leszczynski, Neel Guha, Sen Wu, Simran Arora, Xiao Ling, Christopher Ré

Semi-Supervised Data Cleaning with Raha and Baran

Mohammad Mahdavi, Ziawasch Abedjan

Learned Approximate Query Processing: Make it Light, Accurate and Fast

Qingzhi Ma, Ali M Shanghooshabad, Mehrdad Almasi, Meghdad Kurmanji, Peter Triantafillou

Session 6: Trends and New Directions

New Directions in Cloud Programming

Alvin Cheung, Natacha Crooks, Joseph M Hellerstein, Mae Milano

Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics

Michael Armbrust, Ali Ghodsi, Reynold Xin, Matei Zaharia

Challenges and Opportunities for Autonomous Vehicle Query Systems

Fiodar Kazhamiaka, Matei Zaharia, Peter Bailis

Session 7: Data Structures

The Case for Distance-Bounded Spatial Approximations

Eleni Tzirita Zacharatou, Andreas Kipf, Ibrahim Sabek, Varun Pandey, Harish Doraiswamy, Volker Markl

Hist-Tree: Those Who Ignore It Are Doomed to Learn

Andrew Crotty

Everything is a Transaction: Unifying Logical Concurrency Control and Physical Data Structure Maintenance in Database Management Systems

Ling Zhang, Matthew Butrovich, Tianyu Li, Yash Nannapanei, Andrew Pavlo, John Rollinson, Huanchen Zhang

Session 8: Privacy and Security

Integrity-based Attacks for Encrypted Databases and Implications

Arvind Arasu, Raghav Kaushik, Donald Kosmann, Ravi Ramamurthy

Encrypted Databases: From Theory to Systems

Zheguang Zhao, Seny Kamara, Tarik Moataz, Aroki Systems, Stan Zdonik

Sypse: Privacy-first Data Management through Pseudonymization and Partitioning

Amol Deshpande

Session 9: Platforms for Machine Learning

Cerebro: A Layered Data Platform for Scalable Deep Learning

Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, Kabir Nagrecha

Ease.ML: A Lifecycle Management System for Machine Learning

Leonel Aguilar, David Dao, Shaoduo Gan, Nezihe Merve Gurel, Nora Hollenstein, Jiawei Jiang, Bojan Karlas, Thomas Lemmin, Tian Li, Yang Li, Susie Rao, Johannes Rausch, Cedric Renggli, Luka Rimanic, Maurice Weber, Shuai Zhang, Zhikuan Zhao, Kevin Schawinski, Wentao Wu, Ce Zhang

Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines

Stefan Grafberger, TU Munich, Julia Stoyanovich, Sebastian Schelter

Session 10: Storage and Performance

Bridging the Chasm between Science and Reality

Martin Kersten, Panagiotis Koutsourakis, Niels Nes, Ying Zhang

Computational Storage: Where Are We Today?

Antonio Barbalace, Jaeyoung Do

Universal Layout Emulation for Long-Term Database Archival

Raja Appuswamy, Vincent Joguin

Extended Abstracts

Accelerating Queries over Unstructured Data with ML

Daniel Kang

Hamming Tree: The Case for Memory-Aware Bit Flipping Reduction for NVM Indexing

Saeed Kargar, Faisal Nawab

Cloud Observability: A MELTing Pot for Petabytes of Heterogenous Time Series

Suman Karumuri, Franco Solleza, Stan Zdonik, Nesime Tatbul

DataSense: Display Agnostic Data Documentation

Poonam Kumari, Michael Brachmann, Oliver Kennedy, Su Feng, Boris Glavic

White-Box OLAP Performance Modeling for the Cloud

Maximilian Kuschewski, Viktor Leis

Automating State Management in Computational Notebooks

Stephen Macke

The Need for a New I/O Model

Tarikul Islam Papon, Manos Athanassoulis

Scaling Data Science does not mean Scaling Machines

Devin Petersohn

Data Cleaning in the Era of Data Science: Challenges and Opportunities

El Kindi Rezig

Using Deep Learning Models to Replace Large Materialized Views in Relational Database

Jia Zou