Computes marginal average treatment effects of a binary point treatment on multi-dimensional outcomes, adjusting for baseline covariates, using Targeted Minimum Loss-Based Estimation. A data-mining algorithm is used to perform biomarker selection before multiple testing to increase power.

adaptest(Y, A, W = NULL, n_top, n_fold, parameter_wrapper = rank_DE,
  learning_library = c("SL.glm", "SL.step", "SL.glm.interaction",
  "SL.gam", "SL.earth"), absolute = FALSE, negative = FALSE,
  p_cutoff = 0.05, q_cutoff = 0.05)

Arguments

Y

(numeric vector) - A data.frame or matrix of binary or continuous biomarker measures (outcome variables). Alternatively, this will be an object of class adapTMLE if the wrapper bioadaptest is invoked (n.b., the wrapper is the preferred interface for standard data analytic use-cases arising in computational and genomic biology).

A

(numeric vector) - binary treatment indicator: 1 = treatment, 0 = control

W

(numeric vector, numeric matrix, or numeric data.frame) - matrix of baseline covariates where each column correspond to one baseline covariate and each row corresponds to one observation.

n_top

(integer vector) - value for the number of candidate covariates to generate using the data-adaptive estimation algorithm

n_fold

(integer vector) - number of cross-validation folds.

parameter_wrapper

(function) - user-defined function that takes input (Y, A, W, absolute, negative) and outputs a (integer vector) containing ranks of biomarkers (outcome variables). For details, please refer to the documentation for rank_DE

learning_library

(character vector) - library of learning algorithms to be used in fitting the "Q" and "g" step of the standard TMLE procedure.

absolute

(logical) - whether or not to test for absolute effect size. If FALSE, test for directional effect. This overrides argument negative.

negative

(logical) - whether or not to test for negative effect size. If FALSE = test for positive effect size. This is effective only when absolute = FALSE.

p_cutoff

(numeric) - p-value cutoff (default as 0.05) at and below which to be considered significant. Used in inference stage.

q_cutoff

(numeric) - q-value cutoff (default as 0.05) at and below which to be considered significant. Used in multiple testing stage.

Value

S4 object of class data_adapt, sub-classed from the container class SummarizedExperiment, with the following additional slots containing data-mining selected biomarkers and their TMLE-based differential expression and inference, as well as the original call to this function (for user reference), respectively.

top_index (integer vector) - indices for the data-mining selected biomarkers

top_colname (character vector) - names for the data-mining selected biomarkers

top_colname_significant_q (character vector) - names for the data-mining selected biomarkers, which are significant after multiple testing stage

DE (numeric vector) - differential expression effect sizes for the biomarkers in top_colname

p_value (numeric vector) - p-values for the biomarkers in top_colname

q_value (numeric vector) - q-values for the biomarkers in top_colname

significant_q (integer vector) - indices of top_colname which is significant after multiple testing stage.

mean_rank_top (numeric vector) - average ranking across folds of cross-validation folds for the biomarkers in top_colname

folds (origami::folds class) - cross validation object

Examples

set.seed(1234) data(simpleArray) simulated_array <- simulated_array simulated_treatment <- simulated_treatment adaptest(Y = simulated_array, A = simulated_treatment, W = NULL, n_top = 5, n_fold = 3, learning_library = 'SL.glm', parameter_wrapper = adaptest::rank_DE, absolute = FALSE, negative = FALSE)
#> Loading required package: nnls
#> [1] "The top covariates are" #> 3 5 10 4 6 840 128 #> 1 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000 #> 2 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 #> 3 0.0000000 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 #> 4 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 #> 5 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 #> 567 505 1 519 #> 1 0.0000000 0.0000000 0.0000000 0.0000000 #> 2 0.0000000 0.0000000 0.0000000 0.0000000 #> 3 0.3333333 0.0000000 0.0000000 0.0000000 #> 4 0.0000000 0.3333333 0.0000000 0.0000000 #> 5 0.0000000 0.0000000 0.3333333 0.3333333 #> [1] "The ATE estiamtes are" #> [1] 0.51253445 0.03880535 -0.04632754 0.60661133 0.51471525 #> [1] "The raw p-values are" #> [1] 0.019986718 0.819997461 0.821458685 0.001454666 0.002002301 #> [1] "The adjusted p-values are" #> [1] 0.033311197 0.821458685 0.821458685 0.005005752 0.005005752 #> [1] "The top mean CV-rank are (the smaller the better)" #> [1] 3.333333 4.333333 5.000000 5.666667 12.000000 15.666667 #> [7] 43.333333 64.333333 167.000000 194.000000 281.000000 #> [1] "The percentage of appearing in top 5 are (the larger the better)" #> [1] 100.00000 66.66667 66.66667 33.33333 33.33333 33.33333 33.33333 #> [8] 33.33333 33.33333 33.33333 33.33333 #> [1] "The covariates still significant are" #> [1] 1 4 5 #> [1] "Their compositions are" #> 3 5 10 505 1 519 #> 1 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000 #> 4 0.3333333 0.3333333 0.0000000 0.3333333 0.0000000 0.0000000 #> 5 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333