Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models

Fan, Jianqing; Feng, Yang; Song, Rui

doi:10.1198/jasa.2011.tm09779

Cited by 489 publications

(454 citation statements)

References 36 publications

Supporting

Mentioning

441

Contrasting

Unclassified

Order By: Relevance

“…We found in our numerical analysis that the RSS screening is more robust than the MLSE screening, so the RSS screening is adopted in the simulation studies and the data application in this paper. To determine a data-driven threshold ξ n for the RSS screening, one can use the random permutation to create null models as in Fan et al (2011). An alternative thresholding scheme is to choose d covariates with the smallest marginal residual sum of squares.…”

Section: Independence Screening For High Dimensional Nonlinear Regrmentioning

confidence: 99%

“…The method p-NLIS-NNG refers to the permutation-based screening scheme, which selects the covariates with RSS smaller than a data-driven threshold ξ n estimated from the random permutation. We also compare the proposed method with the iterative nonparametric independence screening (INIS) method developed by Fan et al (2011). The INIS method was designed for the nonparametric additive model.…”

Section: Simulation Studiesmentioning

confidence: 99%

“…It can be applied to the variable selection in the nonlinear additive model (2) where the nonlinear functions f j ’s are assumed to be unknown and estimated nonparametrically. As suggested by Fan et al (2011), we use the greedy modification of the INIS algorithm (g-INIS) where only one covariate is recruited in each screening step.…”

Section: Simulation Studiesmentioning

confidence: 99%

“…It has been shown to have a sure screening property; that is, with probability tending to 1, the independence screening technique retains all of the important features in the model. The SIS was later extended to high dimensional generalized linear models (Fan and Song, 2010) and nonparametric additive models (Fan et al, 2011). The idea of using marginal information to deal with high dimensionality was also adopted in other works.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Variable selection for sparse high-dimensional nonlinear regression models by combining nonnegative garrote and sure independence screening

Han

et al. 2014

STAT SINICA

View full text Add to dashboard Cite

In many regression problems, the relations between the covariates and the response may be nonlinear. Motivated by the application of reconstructing a gene regulatory network, we consider a sparse high-dimensional additive model with the additive components being some known nonlinear functions with unknown parameters. To identify the subset of important covariates, we propose a new method for simultaneous variable selection and parameter estimation by iteratively combining a large-scale variable screening (the nonlinear independence screening, NLIS) and a moderate-scale model selection (the nonnegative garrote, NNG) for the nonlinear additive regressions. We have shown that the NLIS procedure possesses the sure screening property and it is able to handle problems with non-polynomial dimensionality; and for finite dimension problems, the NNG for the nonlinear additive regressions has selection consistency for the unimportant covariates and also estimation consistency for the parameter estimates of the important covariates. The proposed method is applied to simulated data and a real data example for identifying gene regulations to illustrate its numerical performance.

show abstract

Section: Independence Screening For High Dimensional Nonlinear Regrmentioning

confidence: 99%

Section: Simulation Studiesmentioning

confidence: 99%

Section: Simulation Studiesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Variable selection for sparse high-dimensional nonlinear regression models by combining nonnegative garrote and sure independence screening

Han

et al. 2014

STAT SINICA

View full text Add to dashboard Cite

show abstract

“…Hall and Miller (2009) considered a generalized correlation based on polynomial transformations of predictors. See more examples in Fan and Song (2010) for generalized linear models, Zhu, Li, Li and Zhu (2011) for multi-index models, Fan, Feng and Song (2011) for nonparametric additive models, He, Wang and Hong (2013) for heterogeneous nonparametric models, Liu, Li and Wu (2014) ;Fan, Ma and Dai (2014) for varying coefficient models and among others. Without imposing a specific regression model structure, some dependence/independence measures have been also used as marginal utilities to develop model-free variable screenings.…”

Section: Introductionmentioning

confidence: 99%

Forward Additive Regression for Ultrahigh Dimensional Nonparametric Additive Models

Zhong¹,

Duan²,

Zhu³

2020

STAT SINICA

View full text Add to dashboard Cite

Ultrahigh dimensional data are collected in many scientific fields where the predictor dimension is often much higher than the sample size. To reduce the ultrahigh dimensionality effectively, many marginal screening approaches are developed. However, existing screening methods may miss some important predictors which are marginally independent of the response, or select some unimportant ones due to their high correlations with the important predictors. Iterative screening procedures are proposed to address this issue. However, studying their theoretical properties is not straightforward. Penalized regression are not computationally efficient or numerically stable when the predictors are ultrahigh dimensional. To overcome these drawbacks, Wang (2009) proposed a novel Forward Regression (FR) approach for linear models. However, nonlinear dependence between predictors and the response is often present in ultrahigh dimensional problems. In this paper, we further extend the FR to develop a Forward Additive Regression (FAR) method for selecting significant predictors in ultrahigh dimensional nonparametric additive models. We establish the screening consistency for the FAR method and examine its finite-sample performance by Monte Carlo simulations. Our simulations indicate that, compared with marginal screenings, the FAR is shown to be much more effective to identify important predictors for additive models. When the predictors are highly correlated, the FAR even performs better than the iterative marginal screenings, such as iterative nonparametric independence screening (INIS). We also apply the FAR method to a real data analysis in genetic studies.Key words and phrases: Additive models, forward regression, screening consistency, ultrahigh dimensionality, variable selection.Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing) 2 W. Zhong, S. Duan and L. Zhu IntroductionAdvances of modern information technology allow researchers in various scientific fields to collect high dimensional data where the number of predictors is greater than the sample size. Under the sparsity assumption that only a small subset of predictors truly contribute to the response, penalized regression methods have been intensively studied for various parametric and nonparametric models in the literature. They include, but are not limited to, LASSO (Tibshirani, 1996) (Candes and Tao, 2007). These methods are able to select significant variables and estimate parameters simultaneously. As a result, both model interpretability and predictability could be enhanced.When the predictor dimension is much greater than the sample size, the aforementioned penalized approaches may suffer from computational complexity, algorithmic instability or statistical inaccuracy (Fan, Samworth and Wu, 2009). Since the seminal work of Fan and Lv (2008), various marginal screening procedures have been proposed to reduce the ultrahigh dimensionality. The key idea of screening is to rank all predictors using a marginal util...

show abstract

Sure Independence Screening

Fan

2018

Wiley StatsRef: Statistics Reference Online

View full text Add to dashboard Cite

Big data is ubiquitous in various fields of sciences, engineering, medicine, social sciences, and humanities. It is often accompanied by a large number of variables and features. While adding much greater flexibility to modeling with enriched feature space, ultrahigh‐dimensional data analysis poses fundamental challenges to scalable learning and inference with good statistical efficiency. Sure independence screening is a simple and effective method to this endeavor. This framework of two‐scale statistical learning, consisting of large‐scale screening followed by moderate‐scale variable selection introduced in Fan and Lv (2008), has been extensively investigated and extended to various model settings ranging from parametric to semiparametric and nonparametric for regression, classification, and survival analysis. This article provides an overview of the developments of sure independence screening over the past decade. These developments demonstrate the wide applicability of the sure independence screening‐based learning and inference for big data analysis with desired scalability and theoretical guarantees.

show abstract

Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models

Cited by 489 publications

References 36 publications

Variable selection for sparse high-dimensional nonlinear regression models by combining nonnegative garrote and sure independence screening

Variable selection for sparse high-dimensional nonlinear regression models by combining nonnegative garrote and sure independence screening

Forward Additive Regression for Ultrahigh Dimensional Nonparametric Additive Models

Sure Independence Screening

Contact Info

Product

Resources

About