go back

Volume 15, No. 8

Efficient Maximal Biclique Enumeration for Large Sparse Bipartite Graphs

Authors:
Lu Chen (Swinburne University of Technology)* Chengfei Liu (Swinburne University of Technology) Rui Zhou (Swinburne University of Technology) Jiajie Xu (Soochow University) Jianxin Li (Deakin University)

Abstract

Maximal bicliques are effective to reveal meaningful information hidden in a bipartite graph. Enumerating all the maximal bicliques in a large bipartite graph is challenging since the number of the maximal bicliques could grow exponentially w.r.t. the number of vertices in the bipartite graph in the worst case. However, most of the existing algorithms omit the fact that a large bipartite graph is usually very sparse, which is against the worst case and may lead to fast maximal biclique enumeration algorithms. The uncharted opportunity is taking advantage of the sparsity to substantially improve the efficiency of the maximal biclique enumeration for a large sparse bipartite graph. We observe that for a large sparse bipartite graph, a vertex $v$ may converge to a small number of vertices in the same vertex set as $v$ via the neighbours of $v$. This reveals that for a large sparse bipartite graph, the enumeration scope for a vertex could be very small. Based on the observation, we propose novel concepts: the unilateral coreness for individual vertices, the unilateral order for each vertex set and the unilateral convergence ($\varsigma$) for a large sparse bipartite graph. $\varsigma$ is only a few thousand for a large sparse bipartite graph with hundreds of million edges. Using the unilateral order, every vertex with a unilateral coreness of $\tau$ only needs to check at most $2^{\tau}$ combinations so that all the maximal bicliques can be enumerated, where $\tau$ is bounded by $\varsigma$. We propose a novel maximal biclique enumeration algorithm that runs in $\mathcal{O}^{*}(2^{\varsigma})$. Apart from that, we propose the batch-pivots technique, which eliminates all enumerations leading to non-maximal bicliques and guarantees that the proposed algorithm reports maximal bicliques with $\mathcal{O}(\varsigma e)$-delay, where $e$ is the number of edges. We devise novel data structures which allow storing subgraphs at omissible extra space consumption for further speeding up the enumeration. Extensive experiments are conducted on large real datasets, which justifies that our proposed algorithm runs faster and is more scalable than the existing algorithms.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy