go back
go back
Volume 17, No. 11
OUTRE: An OUT-of-core De-REdundancy GNN Training Framework for Massive Graphs within A Single Machine
Abstract
Sampling-based Graph Neural Networks (GNNs) have become the de facto standard for handling various graph learning tasks on large-scale graphs. As the graph size grows larger and even exceeds the standard host memory size of a single machine, out-of-core sampling-based GNN training has gained attention from the community. For out-of-core sampling-based GNN training, the performance bottleneck is the data preparation process that includes sampling neighbor lists and gathering node features from external storage. Based on this observation, existing out-of-core GNN training frameworks try to accomplish larger percentages of data requests without inquiring the external storage by designing better in-memory caches. However, the enormous overall requested data volume is unchanged under this approach. In this paper, we present a new perspective on reducing the overall requested data volume. Through a quantitative analysis, we find that Neighborhood Redundancy and Temporal Redundancy exist in out-of-core sampling-based GNN training. To reduce these two kinds of data redundancies, we propose OUTRE, an OUT-of-core de-REdundancy GNN training framework. OUTRE incorporates two new designs, partition-based batch construction and historical embedding cache, to reduce the corresponding data redundancies. Moreover, we propose automatic cache space management to automatically organize available memory for different caches. Evaluation results on four public large-scale graph datasets show that OUTRE achieves 1.52× to 3.51× speedup against the SOTA framework.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy