go back

Volume 17, No. 11

A Sampling-based Framework for Hypothesis Testing on Large Attributed Graphs

Authors:
Yun Wang, Chrysanthi Kosyfaki, Sihem Amer-Yahia, Reynold Cheng

Abstract

Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing on graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses on attributed graphs. We develop a sampling-based hypothesis testing framework, which can accommodate existing hypothesis-agnostic graph sampling methods. To achieve accurate and time-efficient sampling, we then propose a Path-Hypothesis-Aware SamplEr, PHASE, an m-dimensional random walk that accounts for the paths specified in the hypothesis. We further optimize its time efficiency and propose PHASEopt. Experiments on three real datasets demonstrate the ability of our framework to leverage common graph sampling methods for hypothesis testing, and the superiority of hypothesis-aware sampling methods in terms of accuracy and time efficiency.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy