Xavier Renard scite author profile

Post-hoc interpretability approaches have been proven to be powerful tools to generate explanations for the predictions made by a trained blackbox model. However, they create the risk of having explanations that are a result of some artifacts learned by the model instead of actual knowledge from the data. This paper focuses on the case of counterfactual explanations and asks whether the generated instances can be justified, i.e. continuously connected to some ground-truth data. We evaluate the risk of generating unjustified counterfactual examples by investigating the local neighborhoods of instances whose predictions are to be explained and show that this risk is quite high for several datasets. Furthermore, we show that most state of the art approaches do not differentiate justified from unjustified counterfactual examples, leading to less useful explanations.

show abstract

Comparison-Based Inverse Classification for Interpretability in Machine Learning

Laugel

Lesot

Marsala

et al. 2018

105

View full text Add to dashboard Cite

In the context of post-hoc interpretability, this paper addresses the task of explaining the prediction of a classifier, considering the case where no information is available, neither on the classifier itself, nor on the processed data (neither the training nor the test data). It proposes an instance-based approach whose principle consists in determining the minimal changes needed to alter a prediction: given a data point whose classification must be explained, the proposed method consists in identifying a close neighbour classified differently, where the closeness definition integrates a sparsity constraint. This principle is implemented using observation generation in the Growing Spheres algorithm. Experimental results on two datasets illustrate the relevance of the proposed approach that can be used to gain knowledge about the classifier.

show abstract

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Laugel¹,

Lesot²,

Marsala³

et al. 2019

Preprint

View full text Add to dashboard Cite

Unjustified Classification Regions and Counterfactual Explanations in Machine Learning

Laugel¹,

Lesot²,

Marsala³

et al. 2020

View full text Add to dashboard Cite

Defining Locality for Surrogates in Post-hoc Interpretablity

Laugel¹,

Renard²,

Lesot³

et al. 2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xavier Renard

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Comparison-Based Inverse Classification for Interpretability in Machine Learning

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Unjustified Classification Regions and Counterfactual Explanations in Machine Learning

Defining Locality for Surrogates in Post-hoc Interpretablity

Contact Info

Product

Resources

About