SVEA: A Small-scale Benchmark for Validating the Usability of Post-hoc Explainable AI Solutions in Image and Signal Recognition

Sattarzadeh, Sam; Sudhakar, M.; Plataniotis, Konstantinos N.

doi:10.1109/iccvw54120.2021.00462

Cited by 7 publications

(4 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are specialized benchmarks in the literature, like the SVEA benchmark [29]. The latter focuses on computer vision tasks and proposes faster evaluations based on the small mnist-1D dataset [30].…”

Section: Benchmark For Xai Algorithmsmentioning

confidence: 99%

Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark

Belaid¹,

Hüllermeier²,

Rabus³

et al. 2022

Preprint

View full text Add to dashboard Cite

In recent years, Explainable AI (xAI) attracted a lot of attention as various countries turned explanations into a legal right. xAI allows for improving models beyond the accuracy metric by, e.g., debugging the learned pattern and demystifying the AI's behavior. The widespread use of xAI brought new challenges. On the one hand, the number of published xAI algorithms underwent a boom, and it became difficult for practitioners to select the right tool. On the other hand, some experiments did highlight how easy data scientists could misuse xAI algorithms and misinterpret their results. To tackle the issue of comparing and correctly using feature importance xAI algorithms, we propose Compare-xAI, a benchmark that unifies all exclusive and unitary evaluation methods applied to xAI algorithms. We propose a selection protocol to shortlist non-redundant unit tests from the literature, i.e., each targeting a specific problem in explaining a model. The benchmark encapsulates the complexity of evaluating xAI methods into a hierarchical scoring of three levels, namely, targeting three end-user groups: researchers, practitioners, and laymen in xAI. The most detailed level provides one score per unit test. The second level regroups tests into five categories (fidelity, fragility, stability, simplicity, and stress tests). The last level is the aggregated comprehensibility score, which encapsulates the ease of correctly interpreting the algorithm's output in one easy to compare value. Compare-xAI's interactive user interface helps mitigate errors in interpreting xAI results by quickly listing the recommended xAI solutions for each ML task and their current limitations. The benchmark is made available at https://karim-53.github.io/cxAI/ Preprint. Under review.

show abstract

Section: Benchmark For Xai Algorithmsmentioning

confidence: 99%

Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark

Belaid¹,

Hüllermeier²,

Rabus³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For instance, Sokol and Flach suggest a taxonomy defining criteria an explanatory method has to satisfy to be considered usable, summarized in an "Explainability Fact Sheet". This theoretical groundwork sparked generation of several practical validation frameworks, focusing on function level validation of explanation approaches [3,8,50,51,59]. For instance, these frameworks evaluate explanations in terms of their accuracy and fidelity [3,51,59,68], or robustness [8].…”

Section: Introductionmentioning

confidence: 99%

“…This theoretical groundwork sparked generation of several practical validation frameworks, focusing on function level validation of explanation approaches [3,8,50,51,59]. For instance, these frameworks evaluate explanations in terms of their accuracy and fidelity [3,51,59,68], or robustness [8].…”

Section: Introductionmentioning

confidence: 99%

Let's Go to the Alien Zoo: Introducing an Experimental Framework to Study Usability of Counterfactual Explanations for Machine Learning

Kuhl¹,

Artelt²,

Hammer³

2022

Preprint

View full text Add to dashboard Cite

To foster usefulness and accountability of machine learning (ML), it is essential to explain a model's decisions in addition to evaluating its performance. Accordingly, the field of explainable artificial intelligence (XAI) has resurfaced as a topic of active research, offering approaches to address the "how" and "why" of automated decision-making. Within this domain, counterfactual explanations (CFEs) have gained considerable traction as a psychologically grounded approach to generate posthoc explanations. To do so, CFEs highlight what changes to a model's input would have changed its prediction in a particular way. However, despite the introduction of numerous CFE approaches, their usability has yet to be thoroughly validated at the human level. Thus, to advance the field of XAI, we introduce the Alien Zoo, an engaging, web-based and game-inspired experimental framework. The Alien Zoo provides the means to evaluate usability of CFEs for gaining new knowledge from an automated system, targeting novice users in a domain-general context. As a proof of concept, we demonstrate the practical efficacy and feasibility of this approach in a user study. Our results suggest that users benefit from receiving CFEs compared to no explanation, both in terms of objective performance in the proposed iterative learning task, and subjective usability. With this work, we aim to equip research groups and practitioners with the means to easily run controlled and well-powered user studies to complement their otherwise often more technology-oriented work. Thus, in the interest of reproducible research, we provide the entire code, together with the underlying models and user data: https://github.com/ukuhl/IntroAlienZoo This work is licensed under a Creative Commons "Attribution 4.0 International" license.

show abstract

“…Alongside novel explainability approaches, authors have proposed evaluation criteria and guidelines to systematically assess XAI approaches in terms of their usability (Doshi-Velez and Kim, 2017;Arrieta et al, 2020;Davis et al, 2020;Sokol and Flach, 2020a). This theoretical groundwork sparked several practical validation frameworks, commonly evaluating explanations in terms of accuracy and fidelity (White and d'Avila Garcez, 2020;Pawelczyk et al, 2021;Sattarzadeh et al, 2021;Arras et al, 2022), or robustness (Artelt et al, 2021). However, while XAI taxonomies repeatedly emphasize the need for humanlevel validation of explanation approaches (Doshi-Velez and Kim, 2017;Sokol and Flach, 2020a), user evaluations of XAI approaches often face limitations concerning statistical power and reproducibility (Keane et al, 2021).…”

mentioning

confidence: 99%

Let's go to the Alien Zoo: Introducing an experimental framework to study usability of counterfactual explanations for machine learning

Kuhl¹,

Artelt²,

Hammer³

2023

Front. Comput. Sci.

View full text Add to dashboard Cite

IntroductionTo foster usefulness and accountability of machine learning (ML), it is essential to explain a model's decisions in addition to evaluating its performance. Accordingly, the field of explainable artificial intelligence (XAI) has resurfaced as a topic of active research, offering approaches to address the “how” and “why” of automated decision-making. Within this domain, counterfactual explanations (CFEs) have gained considerable traction as a psychologically grounded approach to generate post-hoc explanations. To do so, CFEs highlight what changes to a model's input would have changed its prediction in a particular way. However, despite the introduction of numerous CFE approaches, their usability has yet to be thoroughly validated at the human level.MethodsTo advance the field of XAI, we introduce the Alien Zoo, an engaging, web-based and game-inspired experimental framework. The Alien Zoo provides the means to evaluate usability of CFEs for gaining new knowledge from an automated system, targeting novice users in a domain-general context. As a proof of concept, we demonstrate the practical efficacy and feasibility of this approach in a user study.ResultsOur results suggest the efficacy of the Alien Zoo framework for empirically investigating aspects of counterfactual explanations in a game-type scenario and a low-knowledge domain. The proof of concept study reveals that users benefit from receiving CFEs compared to no explanation, both in terms of objective performance in the proposed iterative learning task, and subjective usability.DiscussionWith this work, we aim to equip research groups and practitioners with the means to easily run controlled and well-powered user studies to complement their otherwise often more technology-oriented work. Thus, in the interest of reproducible research, we provide the entire code, together with the underlying models and user data: https://github.com/ukuhl/IntroAlienZoo.

show abstract

SVEA: A Small-scale Benchmark for Validating the Usability of Post-hoc Explainable AI Solutions in Image and Signal Recognition

Cited by 7 publications

References 14 publications

Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark

Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark

Let's Go to the Alien Zoo: Introducing an Experimental Framework to Study Usability of Counterfactual Explanations for Machine Learning

Let's go to the Alien Zoo: Introducing an experimental framework to study usability of counterfactual explanations for machine learning

Contact Info

Product

Resources

About