go back
go back
Volume 18, No. 1
CUTTANA: Scalable Graph Partitioning for Faster Distributed Graph Databases and Analytics
Abstract
Graph partitioning plays a pivotal role in various distributed graph processing applications, including graph analytics, graph neural network training, and distributed graph databases. A “good” graph partitioner reduces workload execution time, worker imbalance, and network overhead. Graphs that require distributed settings are often too large to fit in the main memory of a single machine. This challenge renders traditional in-memory graph partitioners infea- sible, leading to the emergence of streaming solutions. Streaming partitioners produce lower-quality partitions, because they work from partial information and must make premature decisions be- fore they have a complete view of a vertex’s neighborhood. We introduce CUTTANA, a streaming graph partitioner that partitions massive graphs (Web/Twitter scale) with superior quality compared to existing streaming solutions. CUTTANA uses a novel buffering technique that prevents the premature assignment of vertices to partitions and a scalable coarsening and refinement technique that enables a complete graph view, improving the intermediate assign- ment made by a streaming partitioner. We implemented a parallel version for CUTTANA that offers nearly the same partitioning latency as existing streaming partitioners. Our experimental analysis shows that CUTTANA consistently yields better partitioning quality than state-of-the-art streaming vertex partitioners in terms of both edge-cut and communication volume metrics. We also evaluate the workload latencies that re- sult from using CUTTANA and other partitioners in distributed graph analytics and databases. CUTTANA outperforms the other methods in most scenarios (algorithms, datasets). In analytics ap- plications, CUTTANA improves runtime performance by up to 59% compared to various streaming partitioners (i.e., HDRF, Fennel, Ginger, HeiStream). In graph database tasks, CUTTANA results in higher query throughput by up to 23%, without hurting tail latency.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy