This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

Automated Performance Management for the Big Data Stack

Authors:
Anastasios Arvanitis, Shivnath Babu, Eric Chu, Adrian Popescu, Alkis Simitsis, Kevin Wilkinson
Abstract

More than 10,000 enterprises worldwide today use the big data stack that is composed of multiple distributed systems. At Unravel, we have worked with a representative sample of these enterprises that covers most industry verticals. This sample also covers the spectrum of choices for deploying the big data stack across on-premises datacenters, private cloud deployments, public cloud deployments, and hybrid combinations of these. In this paper, we aim to bring attention to the performance management requirements that arise in big data stacks. We provide an overview of the requirements both at the level of individual applications as well as holistic clusters and workloads. We present an architecture that can provide automated solutions for these requirements and then do a deep dive into a few of these solutions.