Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods

Lü, Min; Sadiq, Saad; Feaster, Daniel J.; Ishwaran, Hemant

doi:10.1080/10618600.2017.1356325

Cited by 135 publications

(108 citation statements)

References 33 publications

Supporting

Mentioning

106

Contrasting

Order By: Relevance

“…However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data characteristics such as sparsity, correlation among baseline covariates and confounding, are complex. Also, although the performance of causal forests has been previously compared with that of BART and of causal boosting, the 4 algorithms have never been compared head to head. In this paper, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness and safety studies using health care databases.…”

Section: Introductionmentioning

confidence: 99%

Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases

Wendling

Jung

Callahan

et al. 2018

Statistics in Medicine

View full text Add to dashboard Cite

There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies.

show abstract

Section: Introductionmentioning

confidence: 99%

Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases

Wendling

Jung

Callahan

et al. 2018

Statistics in Medicine

View full text Add to dashboard Cite

show abstract

“…Counterfactual random forest (Lu et al 2018) is similar to VT-RF in that they both calculate ITE by taking the difference between predictions of random forest models. However, CF-RF is different from VT-RF by fitting two separate random forests: a control forest fitted with control samples, and a treatment forest fitted with treatment samples.…”

Section: Counterfactual Random-forest (Cf-rf)mentioning

confidence: 99%

When Do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception Using Individual Treatment Effect Estimation

Wang

Culotta

2019

AAAI

View full text Add to dashboard Cite

Studies across many disciplines have shown that lexical choice can affect audience perception. For example, how users describe themselves in a social media profile can affect their perceived socio-economic status. However, we lack general methods for estimating the causal effect of lexical choice on the perception of a specific sentence. While randomized controlled trials may provide good estimates, they do not scale to the potentially millions of comparisons necessary to consider all lexical choices. Instead, in this paper, we first offer two classes of methods to estimate the effect on perception of changing one word to another in a given sentence. The first class of algorithms builds upon quasi-experimental designs to estimate individual treatment effects from observational data. The second class treats treatment effect estimation as a classification problem. We conduct experiments with three data sources (Yelp, Twitter, and Airbnb), finding that the algorithmic estimates align well with those produced by randomized-control trials. Additionally, we find that it is possible to transfer treatment effect classifiers across domains and still maintain high accuracy.

show abstract

“…Specifically, there is a growing number of literatures regarding the efficient estimation of ITE (e.g. Kehl and Ulm, 2006;Tian et al, 2014;Chen et al, 2017;Lu et al, 2018;Wager and Athey, 2018;Zhang et al, 2017) among many others. Although these methods can effectively estimate ITE, the estimated model is typically too complicated, and it would be difficult to understand which biomarkers are actually associate with ITE.…”

Section: Introductionmentioning

confidence: 99%

Efficient screening of predictive biomarkers for individual treatment selection

Noma

2020

Biometrics

View full text Add to dashboard Cite

The development of molecular diagnostic tools to achieve individualized medicine requires identifying predictive biomarkers associated with subgroups of individuals who might receive beneficial or harmful effects from different available treatments.However, due to the large number of candidate biomarkers in the large-scale genetic and molecular studies, and complex relationships among clinical outcome, biomarkers and treatments, the ordinary statistical tests for the interactions between treatments and covariates have difficulties from their limited statistical powers. In this paper, we propose an efficient method for detecting predictive biomarkers. We employ weighted loss functions of Chen et al. (2017) to directly estimate individual treatment scores and propose synthetic posterior inference for effect sizes of biomarkers. We develop an empirical Bayes approach, namely, we estimate unknown hyperparameters in the prior distribution based on data. We then provide the efficient ranking and selection method of the candidate biomarkers based on this framework with adequate control of false discovery rate. The proposed model is demonstrated in simulation studies and an application to a breast cancer clinical study in which the proposed method was shown to detect the much larger numbers of significant biomarkers than the current standard methods.

show abstract

Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods

Cited by 135 publications

References 33 publications

Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases

Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases

When Do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception Using Individual Treatment Effect Estimation

Efficient screening of predictive biomarkers for individual treatment selection

Contact Info

Product

Resources

About