In this paper we consider the Regularization of Derivative Expectation Operator (Rodeo) of Laerty and Wasserman (2008) and propose a modied Rodeo algorithm for semiparametric single index models in big data environment with many regressors. The method assumes sparsity that many of the regressors are irrelevant. It uses a greedy algorithm, in that, to estimate the semiparametric single index model (SIM) of Ichimura (1993), all coecients of the regressors are initially set to start from near zero, then we test iteratively if the derivative of the regression function estimator with respect to each coecient is signicantly dierent from zero. The basic idea of the modied Rodeo algorithm for SIM (to be called SIM-Rodeo) is to view the local bandwidth selection as a variable selection scheme which amplies the coecients for relevant variables while keeping the coecients of irrelevant variables relatively small or at the initial starting values near zero. For sparse semiparametric single index models, the SIM-Rodeo algorithm is shown to attain consistency in variable selection. In addition, the algorithm is fast to nish the greedy steps. We compare SIM-Rodeo with SIM-Lasso method in Zeng et al. (2012). Our simulation results demonstrate that the proposed SIM-Rodeo method is consistent for variable selection and show that it has smaller integrated mean squared errors than SIM-Lasso.
The exact finite sample distribution of the F statistic using the heteroskedasticityconsistent (HC) covariance matrix estimators of the regression parameter estimators is unknown. In this paper, we derive the exact finite sample distribution of the F (= t 2) statistic for a single linear restriction on the regression parameters. We show that the F statistic can be expressed as a ratio of quadratic forms, and therefore its exact cumulative distribution under the null hypothesis can be derived from the result of Imhof (1961). A numerical calculation is carried out for the exact distribution of the F statistic using various HC covariance matrix estimators, and the rejection probability under the null hypothesis (size) based on the exact distribution is examined. The results show the exact finite sample distribution is remarkably reliable, while, in comparison, the use of the F-table leads to a serious over-rejection when the sample is not large or leveraged/unbalanced. An empirical application highlights that the use of the exact distribution of the F statistic will increase the accuracy of inference in empirical research.
Freund and Schapire (1997) introduced "Discrete AdaBoost"(DAB) which has been mysteriously effective for the high-dimensional binary classification or binary prediction. In an effort to understand the myth, Friedman, Hastie and Tibshirani (FHT, 2000) show that DAB can be understood as statistical learning which builds an additive logistic regression model via Newton-like updating minimization of the"exponential loss". From this statistical point of view, FHT proposed three modifications of DAB, namely, Real AdaBoost (RAB), LogitBoost (LB), and Gentle AdaBoost (GAB). All of DAB, RAB, LB, GAB solve for the logistic regression via different algorithmic designs and different objective functions. The RAB algorithm uses class probability estimates to construct real-valued contributions of the weak learner, LB is an adaptive Newton algorithm by stagewise optimization of the Bernoulli likelihood, and GAB is an adaptive Newton algorithm via stagewise optimization of the exponential loss. The same authors of FHT published an influential textbook, The Elements of Statistical Learning (ESL, 2001 and 2008). A companion book An Introduction to Statistical Learning (ISL) by James et al. (2013) was published with applications in R. However, both ESL and ISL (e.g., sections 4.5 and 4.6) do not cover these four AdaBoost algorithms while FHT provided some simulation and empirical studies to compare these methods. Given numerous potential applications, we believe it would be useful to collect the R libraries of these AdaBoost algorithms, as well as more recently developed extensions to Ad-aBoost for probability prediction with examples and illustrations. Therefore, the goal of this chapter is to do just that, i.e., (i) to provide a user guide of these alternative AdaBoost algorithms with step-by-step tutorial of using R (in a way similar to ISL, e.g., Section 4.6), (ii) to compare AdaBoost with alternative machine learning classification tools such as the deep neural network (DNN), logistic regression with LASSO
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.