2023
DOI: 10.1515/jci-2022-0078
|View full text |Cite
|
Sign up to set email alerts
|

Double machine learning and automated confounder selection: A cautionary tale

Abstract: Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls”… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…In particular, this means we need to be careful not to include mediators that have an unobserved common cause with Y or that we introduce a common effect of T and Y. Both cases would open a new path and substantially bias the estimation [70].…”
Section: Dml For Hybrid Modeling-a Causal Perspectivementioning
confidence: 99%
“…In particular, this means we need to be careful not to include mediators that have an unobserved common cause with Y or that we introduce a common effect of T and Y. Both cases would open a new path and substantially bias the estimation [70].…”
Section: Dml For Hybrid Modeling-a Causal Perspectivementioning
confidence: 99%
“…The covariate set was selected based on economic reasoning on the causal effect paths between a farm's AES participation and the respective outcome (Uehleke et al, 2022) and previous studies on AES participation (e.g., Pufahl & Weiss, 2009). By choosing the covariate set in this manner, we reduced the possibility of matching on covariates that might have increased bias of the effect estimates (Hünermund et al, 2022;Wooldridge, 2016).…”
Section: Datamentioning
confidence: 99%
“…These estimators are robust from a statistical standpoint, but not necessarily a causal identification one. The researcher must know which variables are possible confounders and to include them in the appropriate models, while not including colliders or mediators (Hünermund et al, 2023). The simulations discussed in this paper assume conditional ignorability; rather than testing what happens when models are missing important covariates, it focuses on accurate specification of the functional form of the treatment and outcome models.…”
Section: Conceptual Overviewmentioning
confidence: 99%