Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the current paper, we propose a novel method based on principal factor approximation, which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive an approximate expression for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent estimate of realized FDP. This result has important applications in controlling FDR and FDP. Our estimate of realized FDP compares favorably with Efron (2007)’s approach, as demonstrated in the simulated examples. Our approach is further illustrated by some real data applications. We also propose a dependence-adjusted procedure, which is more powerful than the fixed threshold procedure.
In this paper, we develop tests for structural breaks of factor loadings in dynamic factor models. We focus on the joint null hypothesis that all factor loadings are constant over time. Because the number of factor loading parameters goes to infinity as the sample size grows, conventional tests cannot be used. Based on the fact that the presence of a structural change in factor loadings yields a structural change in second moments of factors obtained from the full sample principal component estimation, we reduce the infinite-dimensional problem into a finite-dimensional one and our statistic compares the pre- and postbreak subsample second moments of estimated factors. Our test is consistent under the alternative hypothesis in which a fraction of or all factor loadings have structural changes. The Monte Carlo results show that our test has good finite-sample size and power.
Large-scale multiple testing with highly correlated test statistics arises frequently in many scientific research. Incorporating correlation information in estimating false discovery proportion has attracted increasing attention in recent years. When the covariance matrix of test statistics is known, Fan, Han & Gu (2012) provided a consistent estimate of False Discovery Proportion (FDP) under arbitrary dependence structure. However, the covariance matrix is often unknown in many applications and such dependence information has to be estimated before estimating FDP (Efron, 2010). The estimation accuracy can greatly affect the convergence result of FDP or even violate its consistency. In the current paper, we provide methodological modification and theoretical investigations for estimation of FDP with unknown covariance. First we develop requirements for estimates of eigenvalues and eigenvectors such that we can obtain a consistent estimate of FDP. Secondly we give conditions on the dependence structures such that the estimate of FDP is consistent. Such dependence structures include sparse covariance matrices, which have been popularly considered in the contemporary random matrix theory. When data are sampled from an approximate factor model, which encompasses most practical situations, we provide a consistent estimate of FDP via exploiting this specific dependence structure. The results are further demonstrated by simulation studies and some real data applications.
This article proposes a group bridge estimator to select the correct number of factors in approximate factor models. It contributes to the literature on shrinkage estimation and factor models by extending the conventional bridge estimator from a single equation to a large panel context. The proposed estimator can consistently estimate the factor loadings of relevant factors and shrink the loadings of irrelevant factors to zero with a probability approaching one. Hence, it provides a consistent estimate for the number of factors. We also propose an algorithm for the new estimator; Monte Carlo experiments show that our algorithm converges reasonably fast and that our estimator has very good performance in small samples. An empirical example is also presented based on a commonly used U.S. macroeconomic dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.