5This article concerns tests for sphericity when data dimension is larger than the sample size. The existing multivariate-sign-based procedure (Hallin & Paindaveine, 2006) for sphericity is not robust against high dimensionality, producing tests with type I error rates much larger than nominal levels. This is mainly due to bias from estimating the location parameter. We develop a correction that makes the existing test statistic robust against high dimensionality. We show that the proposed test statistic is asymp-10 totically normal under elliptical distributions. The proposed method allows dimensionality to increase as the square of sample size. Simulations show that it has good size and power for a wide range of settings.
This paper considers distributed statistical inference for a general type of statistics that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms at different locations. In order to facilitate effective computation and to avoid expensive data communication among different platforms, we formulate distributed statistics which can be computed over smaller data blocks. The statistical properties of the distributed statistics are investigated in terms of the mean square error of estimation and their asymptotic distributions with respect to the number of data blocks. In addition, we propose two distributed bootstrap algorithms which are computationally effective and are able to capture the underlying distribution of the distributed statistics. Numerical simulation and real data applications of the proposed approaches are provided to demonstrate the empirical performance.
This paper considers improving the power of tests for the identity and sphericity hypotheses regarding high dimensional covariance matrices. The power improvement is achieved by employing the banding estimator for the covariance matrices, which leads to significant reduction in the variance of the test statistics in high dimension. Theoretical justification and simulation experiments are provided to ensure the validity of the proposed tests. The tests are used to analyze a dataset from an acute lymphoblastic leukemia gene expression study for an illustration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.