We consider two alternative tests to the Higher Criticism test of Donoho and Jin [Ann. Statist. 32 (2004) 962-994] for high-dimensional means under the sparsity of the nonzero means for sub-Gaussian distributed data with unknown column-wise dependence. The two alternative test statistics are constructed by first thresholding L1 and L2 statistics based on the sample means, respectively, followed by maximizing over a range of thresholding levels to make the tests adaptive to the unknown signal strength and sparsity. The two alternative tests can attain the same detection boundary of the Higher Criticism test in [Ann. Statist. 32 (2004) 962-994] which was established for uncorrelated Gaussian data. It is demonstrated that the maximal L2-thresholding test is at least as powerful as the maximal L1-thresholding test, and both the maximal L2 and L1-thresholding tests are at least as powerful as the Higher Criticism test.
Single‐variant‐based genome‐wide association studies have successfully detected many genetic variants that are associated with a number of complex traits. However, their power is limited due to weak marginal signals and ignoring potential complex interactions among genetic variants. The set‐based strategy was proposed to provide a remedy where multiple genetic variants in a given set (e.g., gene or pathway) are jointly evaluated, so that the systematic effect of the set is considered. Among many, the kernel‐based testing (KBT) framework is one of the most popular and powerful methods in set‐based association studies. Given a set of candidate kernels, the method has been proposed to choose the one with the smallest p‐value. Such a method, however, can yield inflated Type 1 error, especially when the number of variants in a set is large. Alternatively one can get p values by permutations which, however, could be very time‐consuming. In this study, we proposed an efficient testing procedure that cannot only control Type 1 error rate but also have power close to the one obtained under the optimal kernel in the candidate kernel set, for quantitative trait association studies. Our method, a maximum kernel‐based U‐statistic method, is built upon the KBT framework and is based on asymptotic results under a high‐dimensional setting. Hence it can efficiently deal with the case where the number of variants in a set is much larger than the sample size. Both simulation and real data analysis demonstrate the advantages of the method compared with its counterparts.
This paper considers testing the equality of two high dimensional means. Two approaches are utilized to formulate L2-type tests for better power performance when the two high dimensional mean vectors differ only in sparsely populated coordinates and the differences are faint. One is to conduct thresholding to remove the non-signal bearing dimensions for variance reduction of the test statistics. The other is to transform the data via the precision matrix for signal enhancement. It is shown that the thresholding and data transformation lead to attractive detection boundaries for the tests. Furthermore, we demonstrate explicitly the effects of precision matrix estimation on the detection boundary for the test with thresholding and data transformation. Extension to multi-sample ANOVA tests is also investigated. Numerical studies are performed to confirm the theoretical findings and demonstrate the practical implementations.
This paper develops a unified test procedure for nonparametric functions in a reproducing kernel Hilbert space (RKHS) of high-dimensional or functional covariates. The test procedure is simple, computationally efficient and practical because we do not need to distinguish highdimensional or functional covariates. We derive the asymptotic distributions of the proposed test statistic under the null and a series of local alternative hypotheses. The asymptotic distributions depend on the decay rate of eigenvalues of the kernel function, which is determined by the kernel function and types of covariates. We also develop a novel kernel selection procedure to maximize the power of the proposed test via maximizing the signal-to-noise ratio. The proposed kernel selection procedure is shown to be consistent in selecting the kernels that maximizing the power function. Moreover, a test with a regularized kernel is constructed to further improve the power.It is shown that the proposed test could nearly achieve the power of an oracle test if the regularization parameter is properly chosen. Extensive simulation studies were conducted to evaluate the finite sample performance of the proposed method. We applied the proposed method to a Yorkshire gilt data set to identify pathways that are associated with the triiodothyronine level.The proposed methods are included in an R package "KerUTest".
Objective: To investigate quantitative imaging markers based on parameters from two diffusion-weighted imaging (DWI) models, continuous-time random-walk (CTRW) and intravoxel incoherent motion (IVIM) models, for characterizing malignant and benign breast lesions by using a machine learning algorithm. Approach: With IRB approval, 40 women with histologically confirmed breast lesions (16 benign, 24 malignant) underwent DWI with 11 b-values (50 to 3000 s/mm2) at 3T. Three CTRW parameters, Dm , α, and β and three IVIM parameters Ddiff , Dperf , and f were estimated from the lesions. A histogram was generated and histogram features of skewness, variance, mean, median, interquartile range; and the value of the 10%, 25%, and 75% quantiles were extracted for each parameter from the regions-of-interest. Feature significance was calculated using the Boruta algorithm using Benjamin Hochberg False Discover Rate and Bonferroni correction for hypothesis testing. Predictive performance of the significant features was evaluated using Support Vector Machine, Random Forest, Naïve Bayes, Gradient Boosted Classifier (GB), Decision Trees, AdaBoost and Gaussian Process machine learning classifiers. Main Results: The 75% quantile, and median of Dm ; 75% quantile of f; mean, median, and skewness of β; kurtosis of Dperf ; and 75% quantile of Ddiff were the most significant features. The GB differentiated malignant and benign lesions with an accuracy of 0.833, an area-under-the-curve of 0.942, and an F1 score of 0.87 providing the best statistical performance (p-value < 0.05) compared to the other classifiers. Significance: Our study has demonstrated that GB with a set of histogram features from the CTRW and IVIM model parameters can effectively differentiate malignant and benign breast lesions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.