2021
DOI: 10.1148/ryai.2021200267
|View full text |Cite
|
Sign up to set email alerts
|

Assessing the Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging

Abstract: To evaluate the trustworthiness of saliency maps for abnormality localization in medical imaging. Materials and Methods:Using two large publicly available radiology datasets (SIIM-ACR Pneumothorax Segmentation and RSNA Pneumonia Detection), we quantified the performance of eight commonly used saliency map techniques in regards to their 1) localization utility (segmentation and detection), 2) sensitivity to model weight randomization, 3) repeatability, and 4) reproducibility. We compared their performances vers… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
82
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 158 publications
(103 citation statements)
references
References 29 publications
0
82
1
Order By: Relevance
“…As saliency maps have been shown to be not reliable in some cases, it is important to ensure their robustness against the model weights, label randomization, as well as their repeatability and localization relevance (Arun et al, 2021). Therefore, sanity checks were conducted, following Adebayo et al (2018).…”
Section: Methodsmentioning
confidence: 99%
“…As saliency maps have been shown to be not reliable in some cases, it is important to ensure their robustness against the model weights, label randomization, as well as their repeatability and localization relevance (Arun et al, 2021). Therefore, sanity checks were conducted, following Adebayo et al (2018).…”
Section: Methodsmentioning
confidence: 99%
“…However, the interpretation of these results warrants additional scrutiny because recent studies emphasized that many popular saliency maps used to interpret CNN trained on medical imaging did not meet several key criteria for utility and robustness, highlighting the need for additional validation before clinical application. 45 47 For the alternative technique, a computer-aided diagnosis system that utilizes the complementary information from CNN-based and feature-based methods will need to be further developed. Also, qualitative analysis of the latest techniques to better obtain the activation map will be required.…”
Section: Discussionmentioning
confidence: 99%
“…Also, qualitative analysis of the latest techniques to better obtain the activation map will be required. 45 …”
Section: Discussionmentioning
confidence: 99%
“…Although existing works on XAI evaluation proposed many real-world application desiderata and evaluation metrics [65,49,71,21,32,2,27,20,24], there is not a canonical criterion on the goodness of explanation, and it is unknown which evaluation objectives are suitable for clinical applications. For the very limited emerging XAI evaluation works on medical image tasks, such as on retinal [63], endoscopic [19], and chest X-Ray [5] imaging tasks, the evaluation mainly focused on one criterion, which is how well the explanation agrees with clinical prior knowledge, without justification for the selection of such criterion and its clinical applicability. This evaluation criterion may be confounded by factors outside XAI methods themselves, such as model training and spurious patterns in the data, as detailed in §2.2.…”
Section: Introductionmentioning
confidence: 99%