SANCUS: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks

Authors:

Jingshu Peng (The Hong Kong University of Science and Technology)* Zhao CHEN (Hong Kong University of Science and Technology) Yingxia Shao (BUPT) Yanyan Shen (Shanghai Jiao Tong University) Lei Chen (Hong Kong University of Science and Technology) Jiannong Cao (The Hong Kong Polytechnic University)

Download PDF

Abstract

Graph data has been prevalent with the ability to model many real-life applications such as the social network. On this occasion, graph neural networks (GNNs) stand out amongst various techniques due to their great success at graph representation learning. However, though GNNs have exhibited formidable power, it still suffers from the inefficiency and inability to scale to large graphs. Therefore, distributed GNN processing comes into play. To avoid the communication bottleneck of distributed GNNs lying in the expensive data movement between workers, we accelerate distributed GNNs by a heterogeneity-aware communication-avoiding decentralized training framework. Our framework abstracts GNN processing as sequential matrix multiplication and utilizes historical features via cache to avoid communication. We introduce a set of novel bounded feature staleness metrics to mitigate system heterogeneity and adaptively skip the broadcast communication. With bounded feature staleness in the decentralized scheme, we theoretically show the approximation error bounds of the intermediate features and gradients, guaranteeing the convergence. Empirically, we incorporate our framework with common GNN models to show its ability to generalize. The performance over large-scale benchmark graph datasets demonstrates the efficiency and effectiveness of the proposed framework. Compared to the SOTA works, we can avoid up to 74% communication without accuracy loss.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 15, No. 9

SANCUS: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks

Abstract