# Instructions for Reproducing Results

Title: Deep-testing: the case of dependence detection
Authors: Gery Geenens (ggeenens@unsw.edu.au), Pierre Lafaye De Micheaux (lafaye@unsw.edu.au) and Ivan Muyun Zou (muyun.zou@unswalumni.com)
Institution: School of Mathematics and statistics, UNSW Sydney, Australia
Journal: 
Date: 27/04/2026

## Step 1: Generation of training samples 

Note: These are not used for performance evaluation. 

The generation of $N = 40,000$ training samples is done by running the R scripts [`0_Training/generate_train_indep.R`](0_Training/generate_train_indep.R) and [`0_Training/generate_train_dep.R`](0_Training/generate_train_dep.R), for the $20,000$ independent and $20,000$  dependent samples respectively, the latter consisting of $1,000$ samples from each of 20 dependence models. For more details, see our article, Section V.A and Appendix A.

The results are stored (as `fst` files; Klik, M. (2022). fst: Lightning Fast Serialization of Data Frames. R package, version 0.9.8.) in subfolders of the folders `independent_samples/` and `dependent_samples/`, respectively. Two files are stored per combination of sample size ($n$ in 50, 100, 200, 400): one for the vectors containing the greyscale density images, and one for the vectors containing both the values of the 19 dependence indicators and the sample size.

The main functions used at this step are:

- `depstats::depgen()`: Dependent sample generation
- `depstats::indgen()`: Independent uniform bivariate sample generator
- `depstats::sampleapply()`: Compute features (competitor scores or image greyscale pixels)
- `normnoise()`: Adding normally distributed noises
- `varnormnoise()`: Adding normally distributed noises with non-constant variance


## Step 2: Training of Deep Learning Models

The R scripts to train our three deep learning models All-CNN, All-MLP, All-CNN-MLP, are `1_Models/fit_combined_CNN_MLP.R`, `1_Models/fit_MLP_on_scores.R` and `1_Models/fit_CNN_on_images.R`, respectively. For more details, see our article, Section V.B. 

These three scripts use the data generated in Step 1.

The data used during the training phase, as well as some other objects (e.g., history) are saved as single R objects via the `base::saveRSD()`
function (`.rds` files).

The neural networks models architectures are stored using the HDF5 format (`.h5` files)

## Step 3: Computation of near-exact critical values

Using the same approach as in Step 1, we generate $N=50,000$ independent samples with sample sizes 30, 50, 100, 200, 300, 400 using the R script [`2_Threshold/generate_threshold_indep.R`](2_Threshold/generate_threshold_indep.R). Computing time: about a few days.

These generated data are stored in the six subfolders of folder `2_Threshold/threshold`.

Then, using the R script [`2_Threshold/I_compute_thresholds.R`](2_Threshold/I_compute_thresholds.R), these data are then fed to the deep learning models fitted (trained) in Step 2 to estimate the 95% quantiles and the associated 5% critical values. For more details, see our article, Section V.C. 

All the critical values are saved in the `2_Threshold/threshold` folder  as single R objects (via the `base::saveRSD()` function).
There, we also store the critical values for all 19 dependence indicators.


## Step 4: Generation of testing samples

To generate testing sets for performance evaluation, run the scripts below. For more details, see our article, Section V.D and Appendix B. All testing samples are stored using the `fst` format in subfolders of the `3_Testing folder`.

Sample sizes considered: $n^{\text{new}} \in \{30,50,100,200,300,400\}$.


**Experiment 1:** $20,000$ samples

- Run [`3_Testing/dependent_tests/run_all.R`](3_Testing/dependent_tests/run_all.R). Computing time: a few days.

**Experiment 2:** $12,000$ samples

- Run [`3_Testing/generate_test_additional.R`](3_Testing/generate_test_additional.R); $8,000$ samples. Additive noise was introduced only for models 22 and 23, using the R function `depstats::normnoise()`. Computing time: a few days. 
- Run [`3_Testing/image_tests/generate_test_images.R`](3_Testing/image_tests/generate_test_images.R); $4,000$ samples

**Not presented in the article:** (4 modified dependence models)

In order to check the robustness of the deep-testing procedure against increasing nois,we also generated
$16, 000$ samples from 4 selected models of dependence ($\tilde{N}=4\times 1,000$ each), namely slight modifications of: 
Model 7 – Circles, Model 8 – Cross, Model 14 – Sine and Model 18 – Spiral, with increasing noise level. See below.
These $\tilde{N}$ samples are independent, but not identically distributed: we actually generated them using only the first additive noise structure (implemented via the R function~\texttt{depstats::normnoise()}) with four different levels of additive noise  : $\tilde{N}_1=1,000$ with noise level $\sigma=0.25$, $\tilde{N}_2=1,000$ with noise level $\sigma=0.5$, $\tilde{N}_3=1,000$ with noise level $\sigma=0.75$ and $\tilde{N}_4=1,000$ with noise level $\sigma=1$ (so that $\tilde{N}=\tilde{N}_1+\tilde{N}_2+\tilde{N}_3+\tilde{N}_4$). 


* \textbf{Modified Model 7 -- Circles.} Take instead 
  $$
  S = \{(U_{2i-1},U_{2i}):i=1,\ldots,N\},\qquad N\sim\text{Unif}\{1,2,\ldots,5\},
  $$
  where the $U$'s are all independent $\text{Unif}[-2,2]$, and $\sigma_k^{(\ell)}$ is fixed (non-random) in $\{0.25,0.5,0.75,1\}$. Also, no rotation was applied.

* \textbf{Modified Model 8 -- Cross.}  Take instead $\sigma_k^{(\ell)}$ fixed (non-random) in $\{0.25,0.5,0.75,1\}$ and $\theta^{(\ell)}\sim\Us_{[0,\pi/6]}$ for the angle of the 2D rotation matrix $\Theta^{(\ell)}$.

* \textbf{Modified Model 14 -- Sine.} Take instead $\sigma_k^{(\ell)}$ fixed (non-random) in $\{0.25,0.5,0.75,1\}$. Also, no rotation was applied.

* \textbf{Modified Model 18 -- Spiral.} Take instead $\sigma_k^{(\ell)}$ fixed (non-random) in $\{0.25,0.5,0.75,1\}$ and $k_\ell\sim\Us_{[2,4]}$. Also, no rotation was applied.

- Run [`3_Testing/generate_test_noise.R`](3_Testing/generate_test_noise.R); Computing time: a few days.


## Step 5: Compute powers

For more details, see our article, Section VI.

- Run [`4_Powers/II_compute_powers.R`](4_Powers/II_compute_powers.R) to compute powers for the trained models as well as other statistical tests. (They are stored in `4_Powers/`, using the RDS format).
- Run [`4_Powers/III_plot_powers.R`](4_Powers/III_plot_powers.R) for generate the power curves in `PDF` format.
- Run [`4_Powers/IV_print_powers.R`](4_Powers/IV_print_powers.R) to generate power tables in `LaTex` format.



