We discuss instrumental variables (IV) estimation in the broader context of the generalized method of moments (GMM), and describe an extended IV estimation routine that provides GMM estimates as well as additional diagnostic tests. Stand-alone test procedures for heteroskedasticity, overidentification, and endogeneity in the IV context are also described.
Instrumental variables and GMM: Estimation and testingdiscussion of intra-group correlation or "clustering". If the error terms in the regression are correlated within groups, but not correlated across groups, then the consequences for IV estimation are similar to those of heteroskedasticity: the IV coefficient estimates are consistent, but their standard errors and the usual forms of the diagnostic tests are not. We discuss how clustering can be interpreted in the GMM context and how it can be dealt with in Stata to make efficient estimation, valid inference, and diagnostic testing possible.Efficient GMM brings with it the advantage of consistency in the presence of arbitrary heteroskedasticity, but at a cost of possibly poor finite sample performance. If heteroskedasticity is in fact not present, then standard IV may be preferable. The usual Breusch-Pagan/Godfrey/Cook-Weisberg and White/Koenker tests for the presence of heteroskedasticity in a regression equation can be applied to an IV regression only under restrictive assumptions. In Section 3, we discuss the test of Pagan and Hall (1983) designed specifically for detecting the presence of heteroskedasticity in IV estimation, and its relationship to these other heteroskedasticity tests.Even when IV or GMM is judged to be the appropriate estimation technique, we may still question its validity in a given application: are our instruments "good instruments"? This is the question we address in Section 4. "Good instruments" should be both relevant and valid: correlated with the endogenous regressors and at the same time orthogonal to the errors. Correlation with the endogenous regressors can be assessed by an examination of the significance of the excluded instruments in the first-stage IV regressions. We may cast some light on whether the instruments satisfy the orthogonality conditions in the context of an overidentified model: that is, one in which a surfeit of instruments are available. In that context, we may test the overidentifying restrictions in order to provide some evidence of the instruments' validity. We present the variants of this test due to Sargan (1958), Basmann (1960, and, in the GMM context, Hansen (1982), and show how the generalization of this test, the C or "difference-in-Sargan" test, can be used to test the validity of subsets of the instruments.Although there may well be reason to suspect nonorthogonality between regressors and errors, the use of IV estimation to address this problem must be balanced against the inevitable loss of efficiency vis-à-vis OLS. It is therefore very useful to have a test of whether or not OLS is inconsistent and IV or GMM is required. This is the Durbin-Wu-Hau...
The Stata Journal and the contents of the supporting files (programs, datasets, and help files) are copyright c by StataCorp LP. The contents of the supporting files (programs, datasets, and help files) may be copied or reproduced by any means whatsoever, in whole or in part, as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal. The articles appearing in the Stata Journal may be copied or reproduced as printed copies, in whole or in part, as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal. Written permission must be obtained from StataCorp if you wish to make electronic copies of the insertions. This precludes placing electronic copies of the Stata Journal, in whole or in part, on publicly accessible web sites, fileservers, or other locations where the copy may be accessed by anyone other than the subscriber.
This article introduces lassopack, a suite of programs for regularized regression in Stata. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso and post-estimation OLS. The methods are suitable for the high-dimensional setting where the number of predictors p may be large and possibly greater than the number of observations, n. We offer three different approaches for selecting the penalization ('tuning') parameters: information criteria (implemented in lasso2), K-fold cross-validation and h-step ahead rolling cross-validation for cross-section, panel and time-series data (cvlasso), and theory-driven ('rigorous') penalization for the lasso and square-root lasso for cross-section and panel data (rlasso). We discuss the theoretical framework and practical considerations for each approach. We also present Monte Carlo results to compare the performance of the penalization approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.