The idea of dimension reduction without loss of information can be quite helpful for guiding the construction of summary plots in regression without requiring a prespeci®ed model. Central subspaces are designed to capture all the information for the regression and to provide a population structure for dimension reduction. Here, we introduce the central k th-moment subspace to capture information from the mean, variance and so on up to the k th conditional moment of the regression. New methods are studied for estimating these subspaces. Connections with sliced inverse regression are established, and examples illustrating the theory are presented.
In high-dimensional data analysis, sliced inverse regression (SIR) has proven to be an effective dimension reduction tool and has enjoyed wide applications. The usual SIR, however, cannot work with problems where the number of predictors, p, exceeds the sample size, n, and can suffer when there is high collinearity among the predictors. In addition, the reduced dimensional space consists of linear combinations of all the original predictors and no variable selection is achieved. In this article, we propose a regularized SIR approach based on the least-squares formulation of SIR. The L2 regularization is introduced, and an alternating least-squares algorithm is developed, to enable SIR to work with n < p and highly correlated predictors. The L1 regularization is further introduced to achieve simultaneous reduction estimation and predictor selection. Both simulations and the analysis of a microarray expression data set demonstrate the usefulness of the proposed method.
In this paper we propose a dimension reduction method for estimating the directions in a multiple-index regression based on information extraction. This extends the recent work of Yin and Cook [X. Yin, R.D. Cook, Direction estimation in single-index regression, Biometrika 92 (2005) 371-384] who introduced the method and used it to estimate the direction in a single-index regression. While a formal extension seems conceptually straightforward, there is a fundamentally new aspect of our extension: We are able to show that, under the assumption of elliptical predictors, the estimation of multiple-index regressions can be decomposed into successive single-index estimation problems. This significantly reduces the computational complexity, because the nonparametric procedure involves only a one-dimensional search at each stage. In addition, we developed a permutation test to assist in estimating the dimension of a multiple-index regression.
We introduce a class of dimension reduction estimators based on an ensemble of the minimum average variance estimates of functions that characterize the central subspace, such as the characteristic functions, the Box-Cox transformations and wavelet basis. The ensemble estimators exhaustively estimate the central subspace without imposing restrictive conditions on the predictors, and have the same convergence rate as the minimum average variance estimates. They are flexible and easy to implement, and allow repeated use of the available sample, which enhances accuracy. They are applicable to both univariate and multivariate responses in a unified form. We establish the consistency and convergence rate of these estimators, and the consistency of a cross validation criterion for order determination. We compare the ensemble estimators with other estimators in a wide variety of models, and establish their competent performance. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2011, Vol. 39, No. 6, 3392-3416. This reprint differs from the original in pagination and typographic detail. 1 2 X. YIN AND B. LI subspace [Cook (1994)], and is denoted by S Y |X . Under mild conditions [ Cook (1996), Yin, Li andCook (2008)], the central subspace is well defined and is unique. A closely related concept is the notion of central mean subspace [Cook and Li (2002)], which is the intersection of all subspaces such that E(Y |X) = E(Y |P S X). This subspace is written as S E(Y |X) . Evidently, if conditional distribution of Y given X depends on X only through E(Y |X), then S Y |X = S E(Y |X) . However, if this conditional distribution also depends on other functions of X, such as var(Y |X), then S E(Y |X) is a proper subspace of S Y |X . Cook and Li (2002) noted that several previously introduced dimension reduction methods, such as the ordinary least squares [Li and Duan (1989), Duan and Li (1991)] and principal Hessian directions [Li (1992), Cook (1998)], actually estimates the central mean subspaces; whereas some other pre-existing estimates, such as the sliced inverse regression (SIR), the SIR-II [Li (1991)] and the sliced average variance estimator (SAVE) [Cook and Weisberg (1991)], can recover additional directions in the central subspace.Yin and Cook (2002) . This provides us with a graduation between the central mean subspace and the central subspace. That is, for sufficiently large k, the subspace spanned by {S E(Y ℓ |X) , ℓ = 1, . . . , k} approaches the central subspace. Zhu and Zeng (2006) showed that the central mean subspaces for E(e ιtY |X), t ∈ R, when put together, recovers the central subspace, and exploited this relation to develop a Fourier transformation method to estimate the central subspace. Here and throughout, we use ι to denote the imaginary unit √ −1. More recently, Zeng and Zhu (2010) developed a general integral transform method. Both papers hint at the following fact: if one can estimate the central mean subspace of E[...
This paper discusses visualization methods for discriminant analysis. It does not address numerical methods for classification per se, but rather focuses on graphical methods that can be viewed as pre-processors, aiding the analyst's understanding of the data and the choice of a final classifier. The methods are adaptations of recent results in dimension reduction for regression, including sliced inverse regression and sliced average variance estimation. A permutation test is suggested as a means of determining dimension, and examples are given throughout the discussion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.