go back
go back
Volume 15, No. 9
SANCUS: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks
Abstract
Graph data has been prevalent with the ability to model many real-life applications such as the social network. On this occasion, graph neural networks (GNNs) stand out amongst various techniques due to their great success at graph representation learning. However, though GNNs have exhibited formidable power, it still suffers from the inefficiency and inability to scale to large graphs. Therefore, distributed GNN processing comes into play. To avoid the communication bottleneck of distributed GNNs lying in the expensive data movement between workers, we accelerate distributed GNNs by a heterogeneity-aware communication-avoiding decentralized training framework. Our framework abstracts GNN processing as sequential matrix multiplication and utilizes historical features via cache to avoid communication. We introduce a set of novel bounded feature staleness metrics to mitigate system heterogeneity and adaptively skip the broadcast communication. With bounded feature staleness in the decentralized scheme, we theoretically show the approximation error bounds of the intermediate features and gradients, guaranteeing the convergence. Empirically, we incorporate our framework with common GNN models to show its ability to generalize. The performance over large-scale benchmark graph datasets demonstrates the efficiency and effectiveness of the proposed framework. Compared to the SOTA works, we can avoid up to 74% communication without accuracy loss.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy