CIDR Proceedings

This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

Go Back

Yellowbrick: An Elastic Data Warehouse on Kubernetes

Authors:

Mark Cusack, John Adamson, Mark Brinicombe, Neil Carson, Thomas Kejser, Jim Peterson, Arvind Vasudev, Kurt Westerfeld

Download PDF

Abstract

The Yellowbrick Data Warehouse delivers efficient, scalable and resilient data warehousing in public clouds and in private data centers. The database management system is composed of a set of Kubernetes-orchestrated microservices. Kubernetes provides the single-source-of-truth for system configuration and state, and manages all data warehouse lifecycle operations, including the creation, expansion, contraction and destruction of elastic compute resources and shared services. The common runtime provided by Kubernetes enabled us to port to three different cloud providers in under a year. We created a SQL interface to Kubernetes to hide the details of the underlying microservices implementation from the end user. We also developed our own reliable network protocol based on the Data Plane Development Kit (DPDK) for efficient data exchange between nodes in the public cloud. In this paper, we provide an overview of Yellowbrick and its microservices approach to delivering elasticity, scalability and separation of compute and storage. We also describe the optimizations we have implemented in the operating system and in our software to drive efficiency and performance, supported by benchmark results. We conclude with lessons learned and discuss future developments.