Graph Independence Testing

This repository contains the code for running the experiments in the manuscript: Xiong, Junhao, et al. “Graph Independence Testing.” arXiv preprint arXiv: 1906.03661 (2019).

The manuscript is currently under major revision, so is the code, so you may not find the exact code to reproduce the figures in the manuscript. For some more updated results, you may consult the slides here.

Files

The core functionalities are in core.py, which contains functions and the necessary utilities to compute test statistic, p-value and power of naive pearson, gcorr (graph correlation) and gcorrDC (a DC-SBM version of gcorr). Note that gcorr is slightly modified from the test statistic in the manuscript, so it is an unbiased estimate of the actual correlation (rather than differ by a constant for SBM).

Simulations

simulations.py contains function to simulate $\rho$-correlated Bernoulli SBMs, $\rho$-correlated Bernoulli DC-SBMs and $\rho$-correlated Gaussian SBMs (based on graspologic implementation but are more general)

The following files correspond (roughly) to figures in the manuscript. Results can be viewed here.

experiments/sim_teststats.py and plotting/plot_sim_test_statistic.py are used to generate Figure 1.
experiments/sim_power.py and plotting/plot_sim_power.py are used to generate Figure 3 and 4.

Real data experiments

This directory currently contains the code to run experiment on the the following datasets:

mouse: a dataset containing connectomes of 4 different species of mice. See some results here
timeseries: a dataset containing the connectome of a single subject sequenced over many time points in time
cpac200: a dataset with connectomes from different subjects.
enron: a dataset where each graph represent email correspondence between subjects in a network.

To run experiments on the associated dataset, the standard workflow is as follow:

Preprocess the raw dataset into a numpy.array with the following format: [# graphs, # vertices, # vertices]. You may need to write some code for this, but it should be straightforward using the functions available in data_utils/.
(optional) Apply a transformation to the graphs using experiments/real_transfrom_data.py
(optional) Estimate community assignments of the graphs using experiments/real_community_estimation.py, if the test statistics and p-value methods you are using required community assignments to be given.
Run experiments/real_teststats_pval.py with the appropriate command-line arguments

Current limitations

Currently, simulation results look good, but the main problem is that we seem to have a big type I error inflation in the real data (the test rarely rejects the null, so we have very low p-values across the board, even when we don’t think there should be acutal dependence). One proposed fix is to use a DC-SBM based test, which seems to work in simulation when the generating models are DC-SBMs, but in real data, it still doesn’t seem to decrease the test statistic or results in a more reasonable p-values.

Also, the test statistics seem to reflect meaningful difference in some datasets (e.g. mouse), but not others (e.g. timeseries, cpac200). It is unclear whether this is because the signal is just not in those datasets, or the test is not powered enough to detect the signal, or due to some preprocessing choices (e.g. choosing the appropriate trimming values for DC-SBM).

Some attempts to address the aforementioned problems can be seen here.

Setup

To run code in this repository, first install Python 3.6. You can use pyenv to manage the Python versions on your machine.

Next, set up the local environment in the ./venv directory:

python -m venv ./venv

To activate the environment, type:

. venv/bin/activate

Then, install the requirements in the local environment:

pip install --upgrade pip
pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Independence Testing

Files

Simulations

Real data experiments

Current limitations

Setup

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
data_utils		data_utils
experiments		experiments
plotting		plotting
tests		tests
.gitignore		.gitignore
README.md		README.md
core.py		core.py
simulations.py		simulations.py
utils.py		utils.py

junhaobearxiong/graph_independence_test

Folders and files

Latest commit

History

Repository files navigation

Graph Independence Testing

Files

Simulations

Real data experiments

Current limitations

Setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages