2018
DOI: 10.1515/em-2017-0020
|View full text |Cite
|
Sign up to set email alerts
|

Propensity Score Estimation Using Classification and Regression Trees in the Presence of Missing Covariate Data

Abstract: Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the fitting of a logistic regression on all subjects, CART is appealing in part because some implementations allow for incomplete records to be incorporated in the tree fitting and provide propensity score estimates for all subjects. Based on theoretical considerations, we argue that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 48 publications
(77 reference statements)
0
11
0
Order By: Relevance
“…In our approach, the substantive model of interest and covariates to include in the propensity score model was explicit. It has been suggested that machine learning or "black box" algorithms may provide reasonable propensity score-weights (4,26,27), however, at the cost of control over the substantive model which is paramount in fulfilling one of the assumptions of multiple imputation: a correctly specified substantive model of interest. And as Bartlett et al notes "We do not consider the requirement to specify a substantive model at the imputation stage to be a shortcoming…" (11).…”
Section: Discussionmentioning
confidence: 99%
“…In our approach, the substantive model of interest and covariates to include in the propensity score model was explicit. It has been suggested that machine learning or "black box" algorithms may provide reasonable propensity score-weights (4,26,27), however, at the cost of control over the substantive model which is paramount in fulfilling one of the assumptions of multiple imputation: a correctly specified substantive model of interest. And as Bartlett et al notes "We do not consider the requirement to specify a substantive model at the imputation stage to be a shortcoming…" (11).…”
Section: Discussionmentioning
confidence: 99%
“…We could incorporate this information in a cardiovascular risk prediction model; the lower the cholesterol the more favourable the prognosis and the lack of a cholesterol measurement is most favourable. When developing a clinical prediction model, an indicator for missingness could be added to a regression model [10,15,16]. Also, machine learning techniques such as classification and regression trees can accommodate missing data by including separate categories for missing values [17,18].…”
Section: Informative Missingness In Electronic Health Records Datamentioning
confidence: 99%
“…Given the computational burden involved in calculating the GPS (De Vries et al 2018), we were unable to perform analysis over all required Monte Carlo simulations (n sim 1053). We therefore iteratively reduced the number of n sim , and used the number of simulated sets for which we had enough memory capacity (that was the case for n sim 10).…”
Section: Performance Measuresmentioning
confidence: 99%
“…Unlike for PS analysis, only one recent study, by De Vries, Van Smeden, and Groenwold (2018), considered the combination of missing data and the GPS (De Vries, Van Smeden, and Groenwold 2018). The authors found that multiple imputations of data, followed by PS estimation using Classification and Regression Trees (CART) resulted in least biased estimates (De Vries, Van Smeden, and Groenwold (2018)). However, no previous study has assessed balance for the GPS under multiple imputations, partly due to the computational burden involved (De Vries, Van Smeden, and Groenwold (2018)).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation