On Pearl’s Hierarchy and the Foundations of Causal Inference

Bareinboim, Elias; Correa, Juan D.; Ibeling, Duligur; Icard, Thomas

doi:10.1145/3501714.3501743

Cited by 93 publications

(136 citation statements)

References 42 publications

Supporting

Mentioning

136

Contrasting

Order By: Relevance

“…Most of the current datasets used in research and industry are observational, and therefore learning the complete causal graph is impossible [23,1,24]. Several causal discovery techniques exist that can learn causal graphs from interventional data [13], e.g., data that shows how features of specific individuals evolved over time.…”

Section: Challenges In Operationalizing Cfesmentioning

confidence: 99%

Counterfactual Explanations for Machine Learning: Challenges Revisited

Verma¹,

Dickerson²,

Hines³

2021

Preprint

View full text Add to dashboard Cite

Counterfactual explanations (CFEs) are an emerging technique under the umbrella of interpretability of machine learning (ML) models. They provide "what if" feedback of the form "if an input datapoint were x instead of x, then an ML model's output would be y instead of y." Counterfactual explainability for ML models has yet to see widespread adoption in industry. In this short paper, we posit reasons for this slow uptake. Leveraging recent work outlining desirable properties of CFEs and our experience running the ML wing of a model monitoring startup, we identify outstanding obstacles hindering CFE deployment in industry.

show abstract

Section: Challenges In Operationalizing Cfesmentioning

confidence: 99%

Counterfactual Explanations for Machine Learning: Challenges Revisited

Verma¹,

Dickerson²,

Hines³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…OOD generalization fundamentally requires additional information beyond i.i.d. data [6,67] counterfactual examples [25,72], or non-stationary time series [23,30,57].…”

Section: Additional Considerationsmentioning

confidence: 99%

“…OOD Generalization fundamentally requires extra information beyond i.i.d. training examples [6,67]. Existing methods use side information such as multiple training environments [1,12,55], counterfactual examples [25,72], or non-stationary time series [23,30,57].…”

Section: Introductionmentioning

confidence: 99%

“…Existing methods use side information such as multiple training environments [1,12,55], counterfactual examples [25,72], or non-stationary time series [23,30,57]. Importantly, OOD generalization is not achievable only through regularizers, network architectures, or unsupervised control of inductive biases [6]. To make this limitation intuitive, consider the task of image recognition in Figure 1.…”

Section: Introductionmentioning

confidence: 99%

“…This information is lost by sampling i.i.d. training examples from the joint distribution produced by the data-generating process [6]. Current approaches to recover the missing information use multiple training environments [1,12,55], counterfactual examples [25,72], or non-stationary time series [23,30,57].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization

Teney¹,

Abbasnejad²,

Lucey³

et al. 2021

Preprint

View full text Add to dashboard Cite

Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn.We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a set of similar models to fit the data in different ways using a penalty on the alignment of their input gradients. We show theoretically and empirically that this induces the learning of more complex predictive patterns. OOD generalization fundamentally requires information beyond i.i.d. examples, such as multiple training environments, counterfactual examples, or other side information.Our approach shows that we can defer this requirement to an independent model selection stage. We obtain SOTA results in visual recognition on biased data and generalization across visual domains. The method -the first to evade the simplicity bias -highlights the need for a better understanding and control of inductive biases in deep learning.

show abstract