Yellowbrick: An Elastic Data Warehouse on Kubernetes
Abstract
The Yellowbrick Data Warehouse delivers efficient, scalable and resilient data warehousing in public clouds and in private data centers. The database management system is composed of a set of Kubernetes-orchestrated microservices. Kubernetes provides the single-source-of-truth for system configuration and state, and manages all data warehouse lifecycle operations, including the creation, expansion, contraction and destruction of elastic compute resources and shared services. The common runtime provided by Kubernetes enabled us to port to three different cloud providers in under a year. We created a SQL interface to Kubernetes to hide the details of the underlying microservices implementation from the end user. We also developed our own reliable network protocol based on the Data Plane Development Kit (DPDK) for efficient data exchange between nodes in the public cloud. In this paper, we provide an overview of Yellowbrick and its microservices approach to delivering elasticity, scalability and separation of compute and storage. We also describe the optimizations we have implemented in the operating system and in our software to drive efficiency and performance, supported by benchmark results. We conclude with lessons learned and discuss future developments.