CIDR Proceedings

This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

Go Back

Topology-aware Parallel Data Processing: Models, Algorithms and Systems at Scale

Authors:

Spyros Blanas, Paraschos Koutris, Anastasios Sidiropoulos

Download PDF

Abstract

The analysis of massive datasets requires a large number of processors. Prior research has largely assumed that tracking the actual data distribution and the underlying network structure of a cluster, which we collectively refer to as the topology, comes with a high cost and has little practical beneﬁt. As a result, theoretical models, algorithms and systems often assume a uniform topology; however this assumption rarely holds in practice. This necessitates an end-to-end investigation of how one can model, design and deploy topology-aware algorithms for fundamental data processing tasks at large scale. To achieve this goal, we first develop a theoretical parallel model that can jointly capture the cost of computation and communication. Using this model, we explore algorithms with theoretical guarantees for three basic tasks: aggregation, join, and sorting. Finally, we consider the practical aspects of implementing topology-aware algorithms at scale, and show that they have the potential to be orders of magnitude faster than their topology-oblivious counterparts.