Testing conditional independence in supervised learning algorithms

Watson, David S.; Wright, Marvin N.

doi:10.1007/s10994-021-06030-6

Cited by 32 publications

(24 citation statements)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Model-agnostic PFI confidence intervals that are similar to ours are proposed by Watson and Wright (2019); Williamson et al (2019Williamson et al ( , 2020. We additionally correct for variance underestimation arising from resampling (Nadeau and Bengio, 2003) and relate the estimators to the proposed ground truth PFI.…”

Section: Related Worksupporting

confidence: 75%

“…This means that the marginal PFI breaks the association between the feature(s) X S and the target Y , but also between X S and all other features X C . For the conditional PFI (cPFI) (Molnar et al, 2020;Watson and Wright, 2019;Hooker and Mentch, 2019;Candès et al, 2018), the expectation is taken over the distribution P X S |X C •P X C Y , so that XS follows the conditional distribution of X S given X C but is still independent of Y . The interpretation of the conditional PFI of a feature is therefore also conditional on all features that are correlated with the feature of interest.…”

Section: Permutation Feature Importance (Pfi)mentioning

confidence: 99%

“…S can be a permutation of the original vector x S . The conditional PFI requires a conditional sampling mechanism for the feature, such as subgroups (Molnar et al, 2020) or knockoffs (Candès et al, 2018;Watson and Wright, 2019). The estimation of P F I requires unseen data, so that the loss estimates deliver unbiased results (Zheng and van der Laan, 2011; Chernozhukov et al, 2018).…”

Section: Permutation Feature Importance (Pfi)mentioning

confidence: 99%

“…; Fisher A model f deviates from f due to model bias and variance. Similarly P D and P F I estimates deviate from their ground truth versions DGP-PD and DGP-PFI due to bias, variance, and Monte Carlo integration (MC) Watson and Wright, 2019;Apley and Zhu, 2020). have been proposed.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Molnar,

Freiesleben,

König

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Scientists and practitioners increasingly rely on machine learning to model data and draw conclusions. Compared to statistical modeling approaches, machine learning makes fewer explicit assumptions about data structures, such as linearity. However, their model parameters usually cannot be easily related to the data generating process. To learn about the modeled relationships, partial dependence (PD) plots and permutation feature importance (PFI) are often used as interpretation methods. However, PD and PFI lack a theory that relates them to the data generating process. We formalize PD and PFI as statistical estimators of ground truth estimands rooted in the data generating process. We show that PD and PFI estimates deviate from this ground truth due to statistical biases, model variance and Monte Carlo approximation errors. To account for model variance in PD and PFI estimation, we propose the learner-PD and the learner-PFI based on model refits, and propose corrected variance and confidence interval estimators.

show abstract

Section: Related Worksupporting

confidence: 75%

Section: Permutation Feature Importance (Pfi)mentioning

confidence: 99%

Section: Permutation Feature Importance (Pfi)mentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Molnar,

Freiesleben,

König

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Hence, interpretations in these regions might be misleading. To avoid this problem, alternatives based on conditional distributions or refitting have been suggested (e.g., Strobl et al, 2008;Nicodemus et al, 2010;Hooker and Mentch, 2019;Watson and Wright, 2019;Molnar et al, 2020). Although the conditional PFI provides a solution to this problem, the interpretation of the score changes.…”

Section: Related Workmentioning

confidence: 99%

Grouped Feature Importance and Combined Features Effect Plot

Au¹,

Herbinger²,

Stachl³

et al. 2021

Preprint

View full text Add to dashboard Cite

Interpretable machine learning has become a very active area of research due to the rising popularity of machine learning algorithms and their inherently challenging interpretability. Most work in this area has been focused on the interpretation of single features in a model. However, for researchers and practitioners, it is often equally important to quantify the importance or visualize the effect of feature groups. To address this research gap, we provide a comprehensive overview of how existing model-agnostic techniques can be defined for feature groups to assess the grouped feature importance, focusing on permutation-based, refitting, and Shapley-based methods. We also introduce an importance-based sequential procedure that identifies a stable and well-performing combination of features in the grouped feature space. Furthermore, we introduce the combined features effect plot, which is a technique to visualize the effect of a group of features based on a sparse, interpretable linear combination of features. We used simulation studies and a real data example from computational psychology to analyze, compare, and discuss these methods.

show abstract

Calibrating machine learning approaches for probability estimation: A comprehensive comparison

Ojeda,

Jansen,

Thiéry

et al. 2023

Statistics in Medicine

View full text Add to dashboard Cite

Statistical prediction models have gained popularity in applied research. One challenge is the transfer of the prediction model to a different population which may be structurally different from the model for which it has been developed. An adaptation to the new population can be achieved by calibrating the model to the characteristics of the target population, for which numerous calibration techniques exist. In view of this diversity, we performed a systematic evaluation of various popular calibration approaches used by the statistical and the machine learning communities for estimating two‐class probabilities. In this work, we first provide a review of the literature and, second, present the results of a comprehensive simulation study. The calibration approaches are compared with respect to their empirical properties and relationships, their ability to generalize precise probability estimates to external populations and their availability in terms of easy‐to‐use software implementations. Third, we provide code from real data analysis allowing its application by researchers. Logistic calibration and beta calibration, which estimate an intercept plus one and two slope parameters, respectively, consistently showed the best results in the simulation studies. Calibration on logit transformed probability estimates generally outperformed calibration methods on nontransformed estimates. In case of structural differences between training and validation data, re‐estimation of the entire prediction model should be outweighted against sample size of the validation data. We recommend regression‐based calibration approaches using transformed probability estimates, where at least one slope is estimated in addition to an intercept for updating probability estimates in validation studies.

show abstract

Testing conditional independence in supervised learning algorithms

Cited by 32 publications

References 61 publications

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Grouped Feature Importance and Combined Features Effect Plot

Calibrating machine learning approaches for probability estimation: A comprehensive comparison

Contact Info

Product

Resources

About