We introduce a new estimator for the vector of coefficients β in the linear model y = Xβ + z, where X has dimensions n × p with p possibly larger than n. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to minb∈ℝp12‖y−Xb‖ℓ22+λ1false|bfalse|false(1false)+λ2false|bfalse|false(2false)+⋯+λpfalse|bfalse|false(pfalse),where λ1 ≥ λ2 ≥ … ≥ λp ≥ 0 and false|bfalse|false(1false)≥false|bfalse|false(2false)≥⋯≥false|bfalse|false(pfalse) are the decreasing absolute values of the entries of b. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical ℓ1 procedures such as the Lasso. Here, the regularizer is a sorted ℓ1 norm, which penalizes the regression coefficients according to their rank: the higher the rank—that is, stronger the signal—the larger the penalty. This is similar to the Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300] procedure (BH) which compares more significant p-values with more stringent thresholds. One notable choice of the sequence {λi} is given by the BH critical values λBHfalse(ifalse)=zfalse(1−i⋅q/2pfalse), where q ∈ (0, 1) and z(α) is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with λBH provably controls FDR at level q. Moreover, it also appears to have appreciable inferential properties under more general designs X while having substantial power, as demonstrated in a series of experiments running on both simulated and real data.
Within a Bayesian decision theoretic framework we investigate some asymptotic optimality properties of a large class of multiple testing rules. A parametric setup is considered, in which observations come from a normal scale mixture model and the total loss is assumed to be the sum of losses for individual tests. Our model can be used for testing point null hypotheses, as well as to distinguish large signals from a multitude of very small effects. A rule is defined to be asymptotically Bayes optimal under sparsity (ABOS), if within our chosen asymptotic framework the ratio of its Bayes risk and that of the Bayes oracle (a rule which minimizes the Bayes risk) converges to one. Our main interest is in the asymptotic scheme where the proportion p of "true" alternatives converges to zero.We fully characterize the class of fixed threshold multiple testing rules which are ABOS, and hence derive conditions for the asymptotic optimality of rules controlling the Bayesian False Discovery Rate (BFDR). We finally provide conditions under which the popular Benjamini-Hochberg (BH) and Bonferroni procedures are ABOS and show that for a wide class of sparsity levels, the threshold of the former can be approximated by a nonrandom threshold.It turns out that while the choice of asymptotically optimal FDR levels for BH depends on the relative cost of a type I error, it is almost independent of the level of sparsity. Specifically, we show that when the number of tests m increases to infinity, then BH with FDR level chosen in accordance with the assumed loss function is ABOS in the entire range of sparsity parameters p ∝ m −β , with β ∈ (0, 1].1. Introduction. Multiple testing has emerged as a very important problem in statistical inference because of its applicability in understanding large data sets involving many parameters. A prominent area of the application of multiple testing is microarray data analysis, where one wants to simultaneously test expression levels of thousands of genes (see, e.g., [18,19,24,31,34,35,41] or [42]). Various ways of performing multiple tests have been proposed in the literature over the years, typically differing in their objective. Among the most popular classical multiple testing procedures, one finds the Bonferroni correction, aimed at controlling the family wise error rate (FWER) and the Benjamini-Hochberg procedure [2], which controls the false discovery rate (FDR). A wide range of empirical Bayes (e.g., see [6,[17][18][19] and [44]) and full Bayes tests (see, e.g., [6,12,31] and [35]) have also been proposed and are used extensively in such problems.In the classical setting, a multiple testing procedure is considered to be optimal if it maximizes the number of true discoveries, while keeping one of the type I error measures (like FWER, FDR or the expected number of false positives) at a certain, fixed level. In this context, it is shown in [25] that the Benjamini-Hochberg procedure (henceforth called BH) is optimal within a large class of step-up multiple testing procedures controlling FDR. In r...
The problem of locating multiple interacting quantitative trait loci (QTL) can be addressed as a multiple regression problem, with marker genotypes being the regressor variables. An important and difficult part in fitting such a regression model is the estimation of the QTL number and respective interactions. Among the many model selection criteria that can be used to estimate the number of regressor variables, none are used to estimate the number of interactions. Our simulations demonstrate that epistatic terms appearing in a model without the related main effects cause the standard model selection criteria to have a strong tendency to overestimate the number of interactions, and so the QTL number. With this as our motivation we investigate the behavior of the Schwarz Bayesian information criterion (BIC) by explaining the phenomenon of the overestimation and proposing a novel modification of BIC that allows the detection of main effects and pairwise interactions in a backcross population. Results of an extensive simulation study demonstrate that our modified version of BIC performs very well in practice. Our methodology can be extended to general populations and higher-order interactions. P OPULAR methods for mapping quantitative traitsional genome searches as a means of mapping epistatic loci (QTL) include interval mapping (Lander and QTL. In particular they proposed an interesting extenBotstein 1989), composite interval mapping (Zeng sion of MQM by addressing a crucial problem pertaining 1993, 1994) and multiple QTL mapping (MQM; Jansen to the choice of marker cofactors. By including all avail-1993; Jansen and Stam 1994). These statistical methods able markers in a regression equation and using a Bayesdo not allow the location of QTL in situations when ian approach to penalize large values of the correspondthere are no main effects for the respective QTL, but ing regression coefficients many of the previously there are (epistatic) interactions with other QTL mentioned issues are eliminated. The disadvantage of (genes) that influence the quantitative trait. Epistatic this method is that, when detecting epistatic QTL, it QTL are known to play important roles in many disease requires the choice of "the effective dimension" (i.e., studies, such as cancer (Fijneman et al. 1996(Fijneman et al. , 1998, number of QTL) for epistatic interactions, which has and it is also suspected that they play a key role in the strong influence on the power of detection. evolutionary process (Wolf et al. 2000).An alternative way to approach the problem of map-A direct solution to detecting epistatic QTL is to ping epistatic QTL relies on developing new methods search for several QTL simultaneously and fit an approfor reducing the numerical complexity of MIM. In repriate multiple regression model with interactions. cent work Carlborg et al. (2000), Nakamichi et al. However, the utility of such an approach, which is re- (2001), and Broman and Speed (2002) use random ferred to as a multidimensional version of interval mapsearc...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.