2022
DOI: 10.1002/mp.15514
|View full text |Cite
|
Sign up to set email alerts
|

On the proper use of structural similarity for the robust evaluation of medical image synthesis models

Abstract: To propose good practices for using the structural similarity metric (SSIM) and reporting its value. SSIM is one of the most popular image quality metrics in use in the medical image synthesis community because of its alleged superiority over voxel-by-voxel measurements like the average error or the peak signal noise ratio (PSNR). It has seen massive adoption since its introduction, but its limitations are often overlooked. Notably, SSIM is designed to work on a strictly positive intensity scale, which is gene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 37 publications
0
7
0
Order By: Relevance
“…Furthermore, reporting uncorrected CBCT image quality is necessary to gauge the relative improvement. Similar recommendations were recently echoed for SSIM: The authors recommended reporting the average SSIM within the patient body contour and ensuring that an appropriate dynamic range is used 118 . As previously discussed, the FID is a popular image quality metric used in assessing generative models that lack a ground truth 94 .…”
Section: Discussionmentioning
confidence: 88%
See 1 more Smart Citation
“…Furthermore, reporting uncorrected CBCT image quality is necessary to gauge the relative improvement. Similar recommendations were recently echoed for SSIM: The authors recommended reporting the average SSIM within the patient body contour and ensuring that an appropriate dynamic range is used 118 . As previously discussed, the FID is a popular image quality metric used in assessing generative models that lack a ground truth 94 .…”
Section: Discussionmentioning
confidence: 88%
“…Similar recommendations were recently echoed for SSIM: The authors recommended reporting the average SSIM within the patient body contour and ensuring that an appropriate dynamic range is used. 118 As previously discussed, the FID is a popular image quality metric used in assessing generative models that lack a ground truth. 94 The distance metric compares the distribution of deep features between two datasets irrespective of their pixel-wise alignment, thereby addressing the inherent flaw of using MAE.…”
Section: Recommendations For Researchersmentioning
confidence: 99%
“…45 The maximum achievable value is 1., if and only if two images are identical. SSIM has been used for evaluating image processing results in different medical applications, such as image synthesis, 46 artifact suppression, 47 predicting post-contrast MR images from pre-contrast images, 48 predicting PET images from MRIs, 49 image generation using GANs. 50,51 The range of SSIM for these applications are wide (∼0.2 to ∼0.99).…”
Section: Discussionmentioning
confidence: 99%
“…The image quality was measured by the average SSIM and the normalized root‐mean‐square error (NRMSE) over the respiratory phases between the model reconstruction and the XD‐GRASP reconstruction. The NRMSE was computed as prefixNRMSE(Iest,Itarget)=1/Mfalse(IestItargetfalse)2/|Itarget|¯$\operatorname{NRMSE}(I_{\rm est}, I_{\rm target}) = \sqrt {1/M \sum (I_{\rm est}- I_{\rm target})^2} / \overline{\vert I_{\rm target}\vert }$, where M is the number of voxels and false|Itargetfalse|¯$\overline{\vert I_{\rm target}\vert }$ is the mean absolute value of I target within the anatomy 57 . The motion estimation quality was quantified in two ways: …”
Section: Methodsmentioning
confidence: 99%
“…The NRMSE was computed as NRMSE(I est , I target ) = √ 1∕M ∑ (I est − I target ) 2 ∕|I target |, where M is the number of voxels and |I target | is the mean absolute value of I target within the anatomy. 57 The motion estimation quality was quantified in two ways:…”
Section: Training and Evaluationmentioning
confidence: 99%