Neural Prototype Trees for Interpretable Fine-grained Image Recognition

Nauta, Meike; Bree, Ron van; Seifert, Christin

doi:10.1109/cvpr46437.2021.01469

Cited by 166 publications

(115 citation statements)

References 50 publications

(59 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Representative examples, including concepts [127], influential training instances [90], prototypical parts [36,179], nearest neighbors and criticisms [125].…”

Section: Prototypes (Parts Of)mentioning

confidence: 99%

From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI

Nauta¹,

Trienes²,

Pathak³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes, also raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness and Correctness, that should be evaluated for comprehensively assessing the quality of an explanation. Our so-called Co-12 properties serve as categorization scheme for systematically reviewing the evaluation practice of more than 300 papers published in the last 7 years at major AI and ML conferences that introduce an XAI method. We find that 1 in 3 papers evaluate exclusively with anecdotal evidence, and 1 in 5 papers evaluate with users. We also contribute to the call for objective, quantifiable evaluation methods by presenting an extensive overview of quantitative XAI evaluation methods. This systematic collection of evaluation methods provides researchers and practitioners with concrete tools to thoroughly validate, benchmark and compare new and existing XAI methods. This also opens up opportunities to include quantitative metrics as optimization criteria during model training in order to optimize for accuracy and interpretability simultaneously.

show abstract

“…Representative examples, including concepts [127], influential training instances [90], prototypical parts [36,179], nearest neighbors and criticisms [125].…”

Section: Prototypes (Parts Of)mentioning

confidence: 99%

From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI

Nauta¹,

Trienes²,

Pathak³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The post-hoc methods are Grad-CAM [23], Grad-CAM++ [4], RISE [22], Score-CAM [31], Ablation CAM [7]. The architectures with native attention are B-CNN [13], BR-NPA [10], the model from [14] which we call IBP (short for Interpretability By Parts), ProtoPNet [5], and ProtoTree [20]. These attention models generate several saliency maps (or attention maps) per input image but the metrics are designed for a single saliency map per image.…”

Section: Benchmarkmentioning

confidence: 99%

Metrics for saliency map evaluation of deep learning explanation methods

Gomez¹,

Fréour²,

Mouchère³

2022

Preprint

View full text Add to dashboard Cite

Due to the black-box nature of deep learning models, there is a recent development of solutions for visual explanations of CNNs. Given the high cost of user studies, metrics are necessary to compare and evaluate these different methods. In this paper, we critically analyze the Deletion Area Under Curve (DAUC) and Insertion Area Under Curve (IAUC) metrics proposed by Petsiuk et al. (2018). These metrics were designed to evaluate the faithfulness of saliency maps generated by generic methods such as Grad-CAM or RISE. First, we show that the actual saliency score values given by the saliency map are ignored as only the ranking of the scores is taken into account. This shows that these metrics are insufficient by themselves, as the visual appearance of a saliency map can change significantly without the ranking of the scores being modified. Secondly, we argue that during the computation of DAUC and IAUC, the model is presented with images that are out of the training distribution which might lead to an unreliable behavior of the model being explained. To complement DAUC/IAUC, we propose new metrics that quantify the sparsity and the calibration of explanation methods, two previously unstudied properties. Finally, we give general remarks about the metrics studied in this paper and discuss how to evaluate them in a user study.

show abstract

“…Building inherently interpretable models, beyond post hoc approaches, is our key challenge here [34]. There have been several recent efforts [6,18,28,46,54], but most of them concentrate on enhancing interpretability only in the last layers of the neural network. In [46], the final linear layer is replaced with a differentiable decision tree, and in [54] a loss is used to make each filter of the very high-level convolutional layer represent a specific object part.…”

Section: Related Workmentioning

confidence: 99%

Coded ResNeXt: a network for designing disentangled information paths

Avranas¹,

Kountouris²

2022

Preprint

View full text Add to dashboard Cite

To avoid treating neural networks as highly complex black boxes, the deep learning research community has tried to build interpretable models allowing humans to understand the decisions taken by the model. Unfortunately, the focus is mostly on manipulating only the very high-level features associated with the last layers. In this work, we look at neural network architectures for classification in a more general way and introduce an algorithm which defines before the training the paths of the network through which the per-class information flows. We show that using our algorithm we can extract a lighter single-purpose binary classifier for a particular class by removing the parameters that do not participate in the predefined information path of that class, which is approximately 60% of the total parameters. Notably, leveraging coding theory to design the information paths enables us to use intermediate network layers for making early predictions without having to evaluate the full network. We demonstrate that a slightly modified ResNeXt model, trained with our algorithm, can achieve higher classification accuracy on CIFAR-10/100 and ImageNet than the original ResNeXt, while having all the aforementioned properties. 1

show abstract

Neural Prototype Trees for Interpretable Fine-grained Image Recognition

Cited by 166 publications

References 50 publications

From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI

From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI

Metrics for saliency map evaluation of deep learning explanation methods

Coded ResNeXt: a network for designing disentangled information paths

Contact Info

Product

Resources

About