Data-adaptive Statistics for High-Dimensional Multiple Testing

Computes marginal average treatment effects of a binary point treatment on multi-dimensional outcomes, adjusting for baseline covariates, using Targeted Minimum Loss-Based Estimation. A data-mining algorithm is used to perform biomarker selection before multiple testing to increase power.

adaptest(Y, A, W = NULL, n_top, n_fold, parameter_wrapper = rank_DE,
  learning_library = c("SL.glm", "SL.step", "SL.glm.interaction",
  "SL.gam", "SL.earth"), absolute = FALSE, negative = FALSE,
  p_cutoff = 0.05, q_cutoff = 0.05)

Arguments

Y	(numeric vector) - A `data.frame` or `matrix` of binary or continuous biomarker measures (outcome variables). Alternatively, this will be an object of class `adapTMLE` if the wrapper `bioadaptest` is invoked (n.b., the wrapper is the preferred interface for standard data analytic use-cases arising in computational and genomic biology).
A	(numeric vector) - binary treatment indicator: `1` = treatment, `0` = control
W	(numeric vector, numeric matrix, or numeric data.frame) - matrix of baseline covariates where each column correspond to one baseline covariate and each row corresponds to one observation.
n_top	(integer vector) - value for the number of candidate covariates to generate using the data-adaptive estimation algorithm
n_fold	(integer vector) - number of cross-validation folds.
parameter_wrapper	(function) - user-defined function that takes input (Y, A, W, absolute, negative) and outputs a (integer vector) containing ranks of biomarkers (outcome variables). For details, please refer to the documentation for `rank_DE`
learning_library	(character vector) - library of learning algorithms to be used in fitting the "Q" and "g" step of the standard TMLE procedure.
absolute	(logical) - whether or not to test for absolute effect size. If `FALSE`, test for directional effect. This overrides argument `negative`.
negative	(logical) - whether or not to test for negative effect size. If `FALSE` = test for positive effect size. This is effective only when `absolute = FALSE`.
p_cutoff	(numeric) - p-value cutoff (default as 0.05) at and below which to be considered significant. Used in inference stage.
q_cutoff	(numeric) - q-value cutoff (default as 0.05) at and below which to be considered significant. Used in multiple testing stage.

Value

S4 object of class data_adapt, sub-classed from the container class SummarizedExperiment, with the following additional slots containing data-mining selected biomarkers and their TMLE-based differential expression and inference, as well as the original call to this function (for user reference), respectively.

top_index (integer vector) - indices for the data-mining selected biomarkers

top_colname (character vector) - names for the data-mining selected biomarkers

top_colname_significant_q (character vector) - names for the data-mining selected biomarkers, which are significant after multiple testing stage

DE (numeric vector) - differential expression effect sizes for the biomarkers in top_colname

p_value (numeric vector) - p-values for the biomarkers in top_colname

q_value (numeric vector) - q-values for the biomarkers in top_colname

significant_q (integer vector) - indices of top_colname which is significant after multiple testing stage.

mean_rank_top (numeric vector) - average ranking across folds of cross-validation folds for the biomarkers in top_colname

folds (origami::folds class) - cross validation object

Examples

set.seed(1234)
data(simpleArray)
simulated_array <- simulated_array
simulated_treatment <- simulated_treatment

adaptest(Y = simulated_array,
         A = simulated_treatment,
         W = NULL,
         n_top = 5,
         n_fold = 3,
         learning_library = 'SL.glm',
         parameter_wrapper = adaptest::rank_DE,
         absolute = FALSE,
         negative = FALSE)
#> Loading required package: nnls
#> [1] "The top covariates are"
#>           3         5        10         4         6       840       128
#> 1 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000
#> 2 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000
#> 3 0.0000000 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333
#> 4 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#> 5 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#>         567       505         1       519
#> 1 0.0000000 0.0000000 0.0000000 0.0000000
#> 2 0.0000000 0.0000000 0.0000000 0.0000000
#> 3 0.3333333 0.0000000 0.0000000 0.0000000
#> 4 0.0000000 0.3333333 0.0000000 0.0000000
#> 5 0.0000000 0.0000000 0.3333333 0.3333333
#> [1] "The ATE estiamtes are"
#> [1]  0.51253445  0.03880535 -0.04632754  0.60661133  0.51471525
#> [1] "The raw p-values are"
#> [1] 0.019986718 0.819997461 0.821458685 0.001454666 0.002002301
#> [1] "The adjusted p-values are"
#> [1] 0.033311197 0.821458685 0.821458685 0.005005752 0.005005752
#> [1] "The top mean CV-rank are (the smaller the better)"
#>  [1]   3.333333   4.333333   5.000000   5.666667  12.000000  15.666667
#>  [7]  43.333333  64.333333 167.000000 194.000000 281.000000
#> [1] "The percentage of appearing in top 5 are (the larger the better)"
#>  [1] 100.00000  66.66667  66.66667  33.33333  33.33333  33.33333  33.33333
#>  [8]  33.33333  33.33333  33.33333  33.33333
#> [1] "The covariates still significant are"
#> [1] 1 4 5
#> [1] "Their compositions are"
#>           3         5        10       505         1       519
#> 1 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000
#> 4 0.3333333 0.3333333 0.0000000 0.3333333 0.0000000 0.0000000
#> 5 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333

Data-adaptive Statistics for High-Dimensional Multiple Testing

Arguments

Value

Examples

Contents