A new formulation for the proportion of true null hypotheses (π 0 ), based on the sum of all p-values and the average of expected p-value under the false null hypotheses has been proposed in the current work. This formulation of the parameter of interest π 0 has also been used to construct a new estimator for the same. The proposed estimator removes the problem of choosing tuning parameters in the existing estimators. Though the formulation is quite general, computation of the new estimator demands use of an initial estimate of π 0 . The issue of choosing an appropriate initial estimator is also discussed in this work. The current work assumes normality of each gene expression level and also assumes similar tests for all the hypotheses. Extensive simulation study shows that, the proposed estimator performs better than its closest competitor, the estimator proposed in Cheng et al., 2015 over a substantial continuous subinterval of the parameter space, under independence and weak dependence among the gene expression levels. The proposed method of estimation is applied to two real gene expression level data-sets and the results are in line with what is obtained by the competing method.
The proportion of non-differentially expressed genes is an important quantity in microarray data analysis and an appropriate estimate of the same is used to construct adaptive multiple testing procedures. Most of the estimators for the proportion of true null hypotheses based on the thresholding, maximum likelihood and density estimation approaches assume independence among the gene expressions. Usually, sparse dependence structure is natural in modelling associations in microarray gene expression data and hence it is necessary to develop methods for accommodating the sparse dependence well within the framework of existing estimators. We propose a clustering based method to put genes in the same group that are not coexpressed using the estimated high dimensional correlation structure under sparse assumption as dissimilarity matrix. This novel method is applied to three existing estimators for the proportion of true null hypotheses. Extensive simulation study shows that the proposed method improves an existing estimator by making it less conservative and the corresponding adaptive Benjamini-Hochberg algorithm more powerful. The proposed method is applied to a microarray gene expression dataset of colorectal cancer patients and the results show gain in terms of number of differentially expressed genes. The R code is available at https://github.com/aniketstat/Proportiontion-of-true-null-under-sparse-dependence-2021 .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.