go back

Volume 15, No. 6

ByteGNN: Efficient Graph Neural Network Training at Large Scale

Authors:
Chenguang Zheng (CUHK)* Hongzhi CHEN (ByteDance) Yuxuan Cheng (ByteDance Inc) Zhezheng Song (CUHK) Yifan Wu (Peking University) Changji Li (CUHK) James Cheng (CUHK) Hao Yang (ByteDance) Shuai Zhang (ByteDance)

Abstract

Graph neural networks (GNNs) have shown excellent performance in a wide range of applications such as recommendation, risk control, and drug discovery. With the increase in the volume of graph data, distributed GNN systems become essential to support efficient GNN training. However, existing distributed GNN training systems suffer from various performance issues including high network communication cost, low CPU utilization, and poor end-to-end performance. In this paper, we propose ByteGNN, which ad- dresses the limitations in existing distributed GNN systems with three key designs: (1) an abstraction of mini-batch graph sampling to support high parallelism, (2) a two-level scheduling strategy to improve resource utilization and to reduce the end-to-end GNN training time, and (3) a graph partitioning algorithm tailored for GNN workloads. Our experiments show that ByteGNN outperforms the state-of-the-art distributed GNN systems with up to 3.5-23.8 times faster end-to-end execution, 2-6 times higher CPU utilization, and around half of the network communication cost.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy