Saliency methods, which produce heat maps that highlight the areas of the medical image that influence model prediction, are often presented to clinicians as an aid in diagnostic decision-making. However, rigorous investigation of the accuracy and reliability of these strategies is necessary before they are integrated into the clinical setting. In this work, we quantitatively evaluate seven saliency methods, including Grad-CAM, across multiple neural network architectures using two evaluation metrics. We establish the first human benchmark for chest X-ray segmentation in a multilabel classification set-up, and examine under what clinical conditions saliency maps might be more prone to failure in localizing important pathologies compared with a human expert benchmark. We find that (1) while Grad-CAM generally localized pathologies better than the other evaluated saliency methods, all seven performed significantly worse compared with the human benchmark, (2) the gap in localization performance between Grad-CAM and the human benchmark was largest for pathologies that were smaller in size and had shapes that were more complex, and (3) model confidence was positively correlated with Grad-CAM localization performance. Our work demonstrates that several important limitations of saliency methods must be addressed before we can rely on them for deep learning explainability in medical imaging.
Deep learning has enabled automated medical image interpretation at a level often surpassing that of practicing medical experts. However, many clinical practices have cited a lack of model interpretability as reason to delay the use of "black-box" deep neural networks in clinical workflows. Saliency maps, which "explain" a model's decision by producing heat maps that highlight the areas of the medical image that influence model prediction, are often presented to clinicians as an aid in diagnostic decision-making. In this work, we demonstrate that the most commonly used saliency map generating method, Grad-CAM, results in low performance for 10 pathologies on chest X-rays. We examined under what clinical conditions saliency maps might be more dangerous to use compared to human experts, and found that Grad-CAM performs worse for pathologies that had multiple instances, were smaller in size, and had shapes that were more complex. Moreover, we showed that model confidence was positively correlated with Grad-CAM localization performance, suggesting that saliency maps were safer for clinicians to use as a decision aid when the model had made a positive prediction with high confidence. Our work demonstrates that several important limitations of interpretability techniques for medical imaging must be addressed before use in clinical workflows.
Background: In an effort to provide greater financial protection from the risk of large medical expenditures, China has gradually added catastrophic medical insurance (CMI) to the various basic insurance schemes. Tongxiang, a rural county in Zhejiang province, China, has had CMI since 20 0 0 for their employee insurance scheme, and since 2014 for their resident insurance scheme.Methods: Compiling and analysing patient-level panel data over five years, we use a difference-indifference approach to study the effect of the 2014 introduction of CMI for resident insurance beneficiaries in Tongxiang. In our study design, resident insurance beneficiaries are the treatment group, while employee insurance beneficiaries are the control group. Findings:We find that availability of CMI significantly increases medical expenditures among resident insurance beneficiaries, including for both inpatient and outpatient spending. Despite the greater financial protection, out-of-pocket expenditures increased, in part because patients accessed treatment more often at higher-level hospitals.Interpretation: Better financial coverage for catastrophic medical expenditures led to greater access and expenditures, not only for inpatient admissions-the category that most often leads to catastrophic expenditures-but for outpatient visits as well. These patterns of expenditure change with CMI may reflect both enhanced access to a patient's preferred site of care as well as the influence of incentives encouraging more care under fee-for-service payment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.