Investigating the limited performance of a deep-learning-based SPECT denoising approach: an observer-study-based characterization

Yu, Zitong; Rahman, Md Ashequr; Jha, Abhinav K.

doi:10.1117/12.2613134

Cited by 7 publications

(8 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, a technical efficacy study may observe suboptimal performance of an AI-based denoising algorithm on the tumor-detection task. Then, the evaluation study could investigate the performance of the algorithm for different tumor properties (size/tumor-to-background ratio) on the detection task (66). This will provide insights on the working principles of the algorithm, thus improving the interpretability of the algorithm.…”

Section: Postdeployment Evaluationmentioning

confidence: 99%

Nuclear Medicine and Artificial Intelligence: Best Practices for Evaluation (the RELAINCE Guidelines)

et al. 2022

Self Cite

View full text Add to dashboard Cite

An important need exists for strategies to perform rigorous objective clinical-task-based evaluation of artificial intelligence (AI) algorithms for nuclear medicine. To address this need, we propose a 4-class framework to evaluate AI algorithms for promise, technical task-specific efficacy, clinical decision making, and postdeployment efficacy. We provide best practices to evaluate AI algorithms for each of these classes. Each class of evaluation yields a claim that provides a descriptive performance of the AI algorithm. Key best practices are tabulated as the RELAINCE (Recommendations for EvaLuation of AI for NuClear medicinE) guidelines. The report was prepared by the Society of Nuclear Medicine and Molecular Imaging AI Task Force Evaluation team, which consisted of nuclear-medicine physicians, physicists, computational imaging scientists, and representatives from industry and regulatory agencies.

show abstract

Section: Postdeployment Evaluationmentioning

confidence: 99%

Nuclear Medicine and Artificial Intelligence: Best Practices for Evaluation (the RELAINCE Guidelines)

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…22 Recent work studying the effect of denoising using neural networks on detection performance in single positron emission computed tomography showed that denoising could decrease detection performance even if it improved RMSE and SSIM. 30 In another study using simulated images showed similar results. 31 Exploring applications where regularization helps in a detection-based perspective would be useful to better understand the regimes where metrics like ERMSE and detection performance agree and disagree.…”

Section: Resultsmentioning

confidence: 61%

“…Some slight improvement was seen in the context of ramp-spectrum noise for human observers 17 and in ideal observer performance in undersampled MRI 22 . Recent work studying the effect of denoising using neural networks on detection performance in single positron emission computed tomography showed that denoising could decrease detection performance even if it improved RMSE and SSIM 30 . In another study using simulated images showed similar results 31 .…”

Section: Resultsmentioning

confidence: 81%

Modeling human observer detection in undersampled magnetic resonance imaging reconstruction with total variation and wavelet sparsity regularization

et al. 2023

View full text Add to dashboard Cite

Purpose: Task-based assessment of image quality in undersampled magnetic resonance imaging provides a way of evaluating the impact of regularization on task performance. In this work, we evaluated the effect of total variation (TV) and wavelet regularization on human detection of signals with a varying background and validated a model observer in predicting human performance.Approach: Human observer studies used two-alternative forced choice (2-AFC) trials with a small signal known exactly task but with varying backgrounds for fluid-attenuated inversion recovery images reconstructed from undersampled multi-coil data. We used a 3.48 undersampling factor with TV and a wavelet sparsity constraints. The sparse difference-of-Gaussians (S-DOG) observer with internal noise was used to model human observer detection. The internal noise for the S-DOG was chosen to match the average percent correct (PC) in 2-AFC studies for four observers using no regularization. That S-DOG model was used to predict the PC of human observers for a range of regularization parameters. Results:We observed a trend that the human observer detection performance remained fairly constant for a broad range of values in the regularization parameter before decreasing at large values. A similar result was found for the normalized ensemble root mean squared error. Without changing the internal noise, the model observer tracked the performance of the human observers as the regularization was increased but overestimated the PC for large amounts of regularization for TV and wavelet sparsity, as well as the combination of both parameters.Conclusions: For the task we studied, the S-DOG observer was able to reasonably predict human performance with both TV and wavelet sparsity regularizers over a broad range of regularization parameters. We observed a trend that task performance remained fairly constant for a range of regularization parameters before decreasing for large amounts of regularization.

show abstract

“…observed no significant correlation between peak signal‐to‐noise ratio and SSIM values and classification performance for a tumor classification task in chest radiographs 19 . A study by our group observed these limitations with a 2D single photon emission computed tomography (SPECT) system with lumpy background‐based tracer distribution models 21 . While these studies show the limitations of these FoMs, it is unclear if the results from these studies are indicators of performance in clinical settings in nuclear medicine.…”

Section: Introductionmentioning

confidence: 97%

Need for objective task‐based evaluation of deep learning‐based denoising methods: A study in the context of myocardial perfusion SPECT

Rahman

Laforest

et al. 2023

Medical Physics

View full text Add to dashboard Cite

Background Artificial intelligence‐based methods have generated substantial interest in nuclear medicine. An area of significant interest has been the use of deep‐learning (DL)‐based approaches for denoising images acquired with lower doses, shorter acquisition times, or both. Objective evaluation of these approaches is essential for clinical application. Purpose DL‐based approaches for denoising nuclear‐medicine images have typically been evaluated using fidelity‐based figures of merit (FoMs) such as root mean squared error (RMSE) and structural similarity index measure (SSIM). However, these images are acquired for clinical tasks and thus should be evaluated based on their performance in these tasks. Our objectives were to: (1) investigate whether evaluation with these FoMs is consistent with objective clinical‐task‐based evaluation; (2) provide a theoretical analysis for determining the impact of denoising on signal‐detection tasks; and (3) demonstrate the utility of virtual imaging trials (VITs) to evaluate DL‐based methods. Methods A VIT to evaluate a DL‐based method for denoising myocardial perfusion SPECT (MPS) images was conducted. To conduct this evaluation study, we followed the recently published best practices for the evaluation of AI algorithms for nuclear medicine (the RELAINCE guidelines). An anthropomorphic patient population modeling clinically relevant variability was simulated. Projection data for this patient population at normal and low‐dose count levels (20%, 15%, 10%, 5%) were generated using well‐validated Monte Carlo‐based simulations. The images were reconstructed using a 3‐D ordered‐subsets expectation maximization‐based approach. Next, the low‐dose images were denoised using a commonly used convolutional neural network‐based approach. The impact of DL‐based denoising was evaluated using both fidelity‐based FoMs and area under the receiver operating characteristic curve (AUC), which quantified performance on the clinical task of detecting perfusion defects in MPS images as obtained using a model observer with anthropomorphic channels. We then provide a mathematical treatment to probe the impact of post‐processing operations on signal‐detection tasks and use this treatment to analyze the findings of this study. Results Based on fidelity‐based FoMs, denoising using the considered DL‐based method led to significantly superior performance. However, based on ROC analysis, denoising did not improve, and in fact, often degraded detection‐task performance. This discordance between fidelity‐based FoMs and task‐based evaluation was observed at all the low‐dose levels and for different cardiac‐defect types. Our theoretical analysis revealed that the major reason for this degraded performance was that the denoising method reduced the difference in the means of the reconstructed images and of the channel operator‐extracted feature vectors between the defect‐absent and defect‐present cases. Conclusions The results show the discrepancy between the evaluation of DL‐based methods with fidelity‐...

show abstract

Investigating the limited performance of a deep-learning-based SPECT denoising approach: an observer-study-based characterization

Cited by 7 publications

References 28 publications

Nuclear Medicine and Artificial Intelligence: Best Practices for Evaluation (the RELAINCE Guidelines)

Nuclear Medicine and Artificial Intelligence: Best Practices for Evaluation (the RELAINCE Guidelines)

Modeling human observer detection in undersampled magnetic resonance imaging reconstruction with total variation and wavelet sparsity regularization

Need for objective task‐based evaluation of deep learning‐based denoising methods: A study in the context of myocardial perfusion SPECT

Contact Info

Product

Resources

About