Stefan Wager scite author profile

Many scientific and engineering challenges-ranging from personalized medicine to customized marketing recommendations-require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

show abstract

Generalized random forests

Athey¹,

Tibshirani²,

Wager³

2019

Ann. Statist.

1,148

1,158

View full text Add to dashboard Cite

We propose generalized random forests, a method for nonparametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian, and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: non-parametric quantile regression, conditional average partial effect estimation, and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN. imsart-aos ver.

show abstract

Quasi-oracle estimation of heterogeneous treatment effects

2020

View full text Add to dashboard Cite

Flexible estimation of heterogeneous treatment effects lies at the heart of many statistical applications, such as personalized medicine and optimal resource allocation. In this article we develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. First, we estimate marginal effects and treatment propensities to form an objective function that isolates the causal component of the signal. Then, we optimize this data-adaptive objective function. The proposed approach has several advantages over existing methods. From a practical perspective, our method is flexible and easy to use: in both steps, any loss-minimization method can be employed, such as penalized regression, deep neural networks, or boosting; moreover, these methods can be fine-tuned by cross-validation. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property. Even when the pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same error bounds as an oracle with prior knowledge of these two nuisance components. We implement variants of our approach based on penalized regression, kernel ridge regression, and boosting in a variety of simulation set-ups, and observe promising performance relative to existing baselines.

show abstract

Approximate Residual Balancing: Debiased Inference of Average Treatment Effects in High Dimensions

2018

View full text Add to dashboard Cite

Summary There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment assignment be as good as random conditional on pretreatment variables. The unconfoundedness assumption is often more plausible if a large number of pretreatment variables are included in the analysis, but this can worsen the performance of standard approaches to treatment effect estimation. We develop a method for debiasing penalized regression adjustments to allow sparse regression methods like the lasso to be used for √n‐consistent inference of average treatment effects in high dimensional linear models. Given linearity, we do not need to assume that the treatment propensities are estimable, or that the average treatment effect is a sparse contrast of the outcome model parameters. Rather, in addition to standard assumptions used to make lasso regression on the outcome model consistent under 1‐norm error, we require only overlap, i.e. that the propensity score be uniformly bounded away from 0 and 1. Procedurally, our method combines balancing weights with a regularized regression adjustment.

show abstract

High-dimensional asymptotics of prediction: Ridge regression and classification

Dobriban¹,

Wager²

2018

Ann. Statist.

181

223

View full text Add to dashboard Cite

We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where p, n → ∞ and p/n → γ ∈ (0, ∞), and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength, and the aspect ratio γ. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover several qualitative insights about both methods: for example, with ridge regression, there is an exact inverse relation between the limiting predictive risk and the limiting estimation risk given a fixed signal strength. Our analysis builds on recent advances in random matrix theory.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stefan Wager

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Generalized random forests

Quasi-oracle estimation of heterogeneous treatment effects

Approximate Residual Balancing: Debiased Inference of Average Treatment Effects in High Dimensions

High-dimensional asymptotics of prediction: Ridge regression and classification

Contact Info

Product

Resources

About