<p>Explainable artificial intelligence (XAI) methods serve as a support for researchers to shed light onto the reasons behind the predictions made by deep neural networks (DNNs). XAI methods have already been successfully applied to climate science, revealing underlying physical mechanisms inherent in the studied data. However, the evaluation and validation of XAI performance is challenging as explanation methods often lack ground truth. As the number of XAI methods is growing, a comprehensive evaluation is necessary to enable well-founded XAI application in climate science.</p> <p>In this work we introduce explanation evaluation in the context of climate research. We apply XAI evaluation to compare multiple explanation methods for a multi-layer percepton (MLP) and a convolutional neural network (CNN). Both MLP and CNN assign temperature maps to classes based on their decade. We assess the respective explanation methods using evaluation metrics measuring robustness, faithfulness, randomization, complexity and localization. Based on the results of a random baseline test we establish an explanation evaluation guideline for the climate community. We use this guideline to rank the performance in each property of similar sets of explanation methods for the MLP and CNN. Independent of the network type, we find that Integrated Gradients, Layer-wise relevance propagation and InputGradients exhibit a higher robustness, faithfulness and complexity compared to purely Gradient-based methods, while sacrificing reactivity to network parameters, i.e. low randomisation scores. The contrary holds for Gradient, SmoothGrad, NoiseGrad and FusionGrad. Another key observation is that explanations using input perturbations, such as SmoothGrad and Integrated Gradients, do not improve robustness and faithfulness, in contrast to theoretical claims. Our experiments highlight that XAI evaluation can be applied to different network tasks and offers more detailed information about different properties of explanation method than previous research. We demonstrate that using XAI evaluation helps to tackle the challenge of choosing an explanation method.</p>
Explainable AI (XAI) is a rapidly evolving field that aims to improve transparency and trustworthiness of AI systems to humans. One of the unsolved challenges in XAI is estimating the performance of these explanation methods for neural networks, which has resulted in numerous competing metrics with little to no indication of which one is to be preferred. In this paper, to identify the most reliable evaluation method in a given explainability context, we propose MetaQuantus-a simple yet powerful framework that meta-evaluates two complementary performance characteristics of an evaluation method: its resilience to noise and reactivity to randomness. We demonstrate the effectiveness of our framework through a series of experiments, targeting various open questions in XAI, such as the selection of explanation methods and optimisation of hyperparameters of a given metric. We release our work under an open-source license 1 to serve as a development tool for XAI researchers and Machine Learning (ML) practitioners to verify and benchmark newly constructed metrics (i.e., "estimators" of explanation quality). With this work, we provide clear and theoretically-grounded guidance for building reliable evaluation methods, thus facilitating standardisation and reproducibility in the field of XAI.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.