Summary In high-dimensional model selection problems, penalized least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L1-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L1-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L1-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples.
High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of non-concave penalized methods for non-polynomial (NP) dimensional data with censoring in the framework of Cox’s proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that non-concave penalties lead to significant reduction of the “irrepresentable condition” needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the non-concave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.
We propose a methodology for testing linear hypothesis in high-dimensional linear models. The proposed test does not impose any restriction on the size of the model, i.e. model sparsity or the loading vector representing the hypothesis. Providing asymptotically valid methods for testing general linear functions of the regression parameters in high-dimensions is extremely challenging -- especially without making restrictive or unverifiable assumptions on the number of non-zero elements. We propose to test the moment conditions related to the newly designed restructured regression, where the inputs are transformed and augmented features. These new features incorporate the structure of the null hypothesis directly. The test statistics are constructed in such a way that lack of sparsity in the original model parameter does not present a problem for the theoretical justification of our procedures. We establish asymptotically exact control on Type I error without imposing any sparsity assumptions on model parameter or the vector representing the linear hypothesis. Our method is also shown to achieve certain optimality in detecting deviations from the null hypothesis. We demonstrate the favorable finite-sample performance of the proposed methods, via a number of numerical and a real data example.Comment: 42 pages, 8 figure
Many popular methods for building confidence intervals on causal effects under high-dimensional confounding require strong "ultra-sparsity" assumptions that may be difficult to validate in practice. To alleviate this difficulty, we here study a new method for average treatment effect estimation that yields asymptotically exact confidence intervals assuming that either the conditional response surface or the conditional probability of treatment allows for an ultra-sparse representation (but not necessarily both). This guarantee allows us to provide valid inference for average treatment effect in high dimensions under considerably more generality than available baselines. In addition, we showcase that our results are semi-parametrically efficient.
The purpose of this paper is to construct confidence intervals for the regression coefficients in high-dimensional Cox proportional hazards regression models where the number of covariates may be larger than the sample size. Our debiased estimator construction is similar to those in Zhang and Zhang (2014) and van de Geer et al. (2014), but the time-dependent covariates and censored risk sets introduce considerable additional challenges. Our theoretical results, which provide conditions under which our confidence intervals are asymptotically valid, are supported by extensive numerical experiments.asymptotically valid confidence intervals for components of β o (or more generally, for linear combinations c β o , for some fixed c ∈ R p ).Our interest in this paper lies in providing corresponding confidence intervals in the high-dimensional regime, where p may be much larger than n. The motivation for such methodology arises from many different application areas, but particularly in biomedicine, where Cox models are ubiquitous and data on each individual, which may arise in the form of combinations of genetic information, greyscale values for each pixel in a scan and many other types, are often plentiful. Our construction begins with the Lasso penalised partial likelihood estimator β studied in Huang et al. (2013), which is used as an initial estimator and which is sparse. We then seek a sparse estimator of the inverse of negative Hessian matrix, which we will refer to as a sparse precision matrix estimator. In Zhang and Zhang (2014) and van de Geer et al. (2014), who consider similar problems in the linear and generalised linear model settings respectively, this sparse precision matrix estimator is constructed via nodewise Lasso regression (Meinshausen and Bühlmann, 2006). On the other hand, Javanmard and Montanari (2013) and Javanmard and Montanari (2014) derived their precision matrix estimators by minimising the trace of the product of the sample covariance matrix and the precision matrix, and the covariates are assumed to be centred. However, in the Cox model setting, the counterpart of the design matrix is a mean-shifted design matrix, where the mean is based on a set of tilting weights, and this destroys the necessary independence structure. Instead, we adopt a modification of the CLIME estimator (Cai et al., 2011) as the sparse precision matrix estimator, which allows us to handle the mean subtraction. Adjusting β by the product of our sparse precision matrix estimator and the score vector yields a debiased estimator b, and our main theoretical result (Theorem 1) provides conditions under which c b is asymptotically normally distributed around c β o . The desired confidence intervals can then be obtained straighforwardly. Further very recent applications of the debiasing idea, outside the regression problem context, can found in Janková and van de Geer (2018a) and Janková and van de Geer (2018b).The formidable theoretical challenges involved in proving the asymptotic normality of c b arise in part from our des...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.