go back
go back
Volume 17, No. 11
Efficient k-Clique Count Estimation with Accuracy Guarantee
Abstract
Counting and enumerating all occurrences of ๐-cliques, i.e., complete subgraphs with ๐ vertices, in a large graph ๐บ is a fundamental problem with many applications. However, exact solutions are often infeasible due to the exponential growth in the number of ๐-cliques when ๐ increases. Thus, a more practical approach is approximately counting and uniformly sampling ๐-cliques. Turรกn-Shadow and DPColorPath are two state-of-the-art algorithms for approximately counting ๐-cliques. The general idea is first constructing a sample space that is a superset of all ๐-cliques in ๐บ, and then sampling ๐ก elements uniformly-at-random (u.a.r.) from the sample space for a pre-determined ๐ก ; the ๐ -clique count is estimated as the sample space size multiplied by the ratio of ๐-cliques among the ๐ก samples. Although techniques have been proposed in Turรกn-Shadow for setting ๐ก to ensure the estimation accuracy, the theoretically chosen ๐ก is often too large to be practical. As a result, both of the existing algorithms used a fixed ๐ก in their implementations and thus do not offer accuracy guarantee. In this paper, we propose the first randomized algorithm that achieves the theoretical estimation accuracy and the practical efficiency at the same time. Different from the existing algorithms, we pre-determine the number ๐ of ๐-clique samples that are required to achieve the estimation accuracy. Consequently, we can estimate the running time of the sampling stage (i.e., time taken to sample ๐ ๐-cliques), for a given sample space. Then, we propose to balance the time of constructing/refining the sample space and the time of the sampling stage, by stopping the refinement of the sample space once the elapsed time is comparable to the estimated time of the sampling stage. Extensive empirical studies on large real graphs show that our algorithm SR-kCCE provides an accurate ๐-clique count estimation and also runs efficiently. As a by-product, our algorithm can also be used for efficiently sampling a certain number of ๐-cliques u.a.r. from ๐บ.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy