AdaPT: An Interactive Procedure for Multiple Testing with Side Information

Lei, Lihua; Fithian, William

doi:10.1111/rssb.12274

Cited by 130 publications

(252 citation statements)

References 54 publications

Supporting

Mentioning

247

Contrasting

Order By: Relevance

“…In addition, the covariate patterns learned by AdaFDR are shown in Figure 3b for the Bottomly data and the Pasilla data, and in Supplementary Figure 2c for the airway data. The alternative hypotheses are more likely to occur when the expression levels are high, consistent with previous findings 12,13,24 .…”

Section: Rna-seq Datasupporting

confidence: 91%

“…The default parameters of AdaFDR are used for every experiment in this paper, both real data analysis and simulations, without any tuning. In addition to the experiments, we theoretically prove that AdaFDR controls FDP with high probability when the null p-values, conditional on the covariates, are independently distributed and stochastically greater than the uniform distribution, a standard assumption also made by related literature 13,23,24 . Related works.…”

mentioning

confidence: 75%

“…The key assumption is that the null p-values remain uniform regardless of the covariate value while others, including the alternative p-values and the likelihood for the hypotheses to be true null/alternative, may have arbitrary dependencies on the covariate. This is a standard assumption in the literature 12,23,24 . For example, in the case of AAF, the null p-values are uniformly distributed independent of AAF since the gene expression has no association with the SNP under the null hypothesis.…”

Section: Multiple Testing Via Adafdrmentioning

confidence: 99%

“…Such estimator has been used in recent works 24,27,48,49 and yields a conservative estimate of the true number of false discoveries (FD), in the sense that its expected value is larger than that of the true FD under mild assumptions (Lemma 1 in Supplementary Materials). Furthermore, FDP can be simply estimated as FDP(t) = FD(t) D(t) .…”

Section: Multiple Testing Via Adafdrmentioning

confidence: 99%

See 3 more Smart Citations

AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing

Zhang

Xia

Zou

2018

Preprint

View full text Add to dashboard Cite

Multiple hypothesis testing is an essential component of modern data science. Its goal is to maximize the number of discoveries while controlling the fraction of false discoveries. In many settings, in addition to the p-value, additional information/covariates for each hypothesis are available. For example, in eQTL studies, each hypothesis tests the correlation between a variant and the expression of a gene. We also have additional covariates such as the location, conservation and chromatin status of the variant, which could inform how likely the association is to be due to noise. However, popular multiple hypothesis testing approaches, such as Benjamini-Hochberg procedure (BH) and independent hypothesis weighting (IHW), either ignore these covariates or assume the covariate to be univariate. We introduce AdaFDR, a fast and flexible method that adaptively learns the optimal p-value threshold from covariates to significantly improve detection power. On eQTL analysis of the GTEx data, AdaFDR discovers 32% and 27% more associations than BH and IHW, respectively, at the same false discovery rate. We prove that AdaFDR controls false discovery proportion, and show that it makes substantially more discoveries while controlling FDR in extensive experiments. AdaFDR is computationally efficient and can process more than 100 million hypotheses within an hour and allows multi-dimensional covariates with both numeric and categorical values. It also provides exploratory plots for the user to interpret how each covariate affects the significance of hypotheses, making it broadly useful across many applications. of covariates, whereas the output is a set of selected (also called rejected) hypotheses. For eQTL analysis, each hypothesis is one pair of SNP and gene, and the p-value tests for association between their values across samples. The covariate can be the location, conservation, and chromatin status at the SNP and the gene. The standard assumption of AdaFDR and all the related methods is that the covariates should not affect the p-values under the null hypothesis (see the Methods section for more discussion of this). AdaFDR learns the covariate-dependent p-value selection threshold by first fitting a mixture model using expectation maximization (EM) algorithm, where the mixture model is a combination of a generalized linear model (GLM) and Gaussian mixtures 9-11 . Then it makes local adjustments in the p-value threshold by optimizing for more discoveries. We prove that AdaFDR controls FDP under standard statistical assumptions in Theorem 1. AdaFDR is designed to be fast and flexible -it can simultaneously process more than 100 million hypotheses within an hour and allows multi-dimensional covariates with both numeric and categorical values. In addition, AdaFDR provides exploratory plots visualizing how each covariate is related to the significance of hypotheses, allowing users to interpret its findings. We also provide a much faster but slightly less powerful version, AdaFDR-fast, which uses only the EM step and skips the s...

show abstract

Section: Rna-seq Datasupporting

confidence: 91%

mentioning

confidence: 75%

Section: Multiple Testing Via Adafdrmentioning

confidence: 99%

Section: Multiple Testing Via Adafdrmentioning

confidence: 99%

See 2 more Smart Citations

AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing

Zhang

Xia

Zou

2018

Preprint

View full text Add to dashboard Cite

show abstract

“…Barber and Candès's (2015) selective SeqStep modifies the goal by rejecting only those p-values in the first stretch of the list that also lie below some threshold, and Lei and Fithian's (2016) adaptive SeqStep method finds a substantial increase in power by adapting to the proportion of nulls. In other settings, the ordering of the tests might not be fixed in advance, but might be determined over time by interacting with information revealed in the data set; Lei and Fithian's (2018) AdaPT method and the STAR method of Lei et al (2017) propose algorithms testing in an order that is determined interactively without loss of FDR control.…”

Section: False Discovery Rate Control With Structured Signals and Nullsmentioning

confidence: 99%

Multiple Testing with the Structure-Adaptive Benjamini–Hochberg Algorithm

Barber

2018

Journal of the Royal Statistical Society Series B: Statistical Methodology

122

156

View full text Add to dashboard Cite

Summary In multiple‐testing problems, where a large number of hypotheses are tested simultaneously, false discovery rate (FDR) control can be achieved with the well‐known Benjamini–Hochberg procedure, which a(0,1]dapts to the amount of signal in the data, under certain distributional assumptions. Many modifications of this procedure have been proposed to improve power in scenarios where the hypotheses are organized into groups or into a hierarchy, as well as other structured settings. Here we introduce the ‘structure‐adaptive Benjamini–Hochberg algorithm’ (SABHA) as a generalization of these adaptive testing methods. The SABHA method incorporates prior information about any predetermined type of structure in the pattern of locations of the signals and nulls within the list of hypotheses, to reweight the p‐values in a data‐adaptive way. This raises the power by making more discoveries in regions where signals appear to be more common. Our main theoretical result proves that the SABHA method controls the FDR at a level that is at most slightly higher than the target FDR level, as long as the adaptive weights are constrained sufficiently so as not to overfit too much to the data—interestingly, the excess FDR can be related to the Rademacher complexity or Gaussian width of the class from which we choose our data‐adaptive weights. We apply this general framework to various structured settings, including ordered, grouped and low total variation structures, and obtain the bounds on the FDR for each specific setting. We also examine the empirical performance of the SABHA method on functional magnetic resonance imaging activity data and on gene–drug response data, as well as on simulated data.

show abstract

Accurate error control in high‐dimensional association testing using conditional false discovery rates

Liley

Wallace

2021

Biometrical J

View full text Add to dashboard Cite

High‐dimensional hypothesis testing is ubiquitous in the biomedical sciences, and informative covariates may be employed to improve power. The conditional false discovery rate (cFDR) is a widely used approach suited to the setting where the covariate is a set of p‐values for the equivalent hypotheses for a second trait. Although related to the Benjamini–Hochberg procedure, it does not permit any easy control of type‐1 error rate and existing methods are over‐conservative. We propose a new method for type‐1 error rate control based on identifying mappings from the unit square to the unit interval defined by the estimated cFDR and splitting observations so that each map is independent of the observations it is used to test. We also propose an adjustment to the existing cFDR estimator which further improves power. We show by simulation that the new method more than doubles potential improvement in power over unconditional analyses compared to existing methods. We demonstrate our method on transcriptome‐wide association studies and show that the method can be used in an iterative way, enabling the use of multiple covariates successively. Our methods substantially improve the power and applicability of cFDR analysis.

show abstract

AdaPT: An Interactive Procedure for Multiple Testing with Side Information

Cited by 130 publications

References 54 publications

AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing

AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing

Multiple Testing with the Structure-Adaptive Benjamini–Hochberg Algorithm

Accurate error control in high‐dimensional association testing using conditional false discovery rates

Contact Info

Product

Resources

About