2022
DOI: 10.1186/s13104-022-06096-y
|View full text |Cite
|
Sign up to set email alerts
|

Towards a guideline for evaluation metrics in medical image segmentation

Abstract: In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these models have powerful prediction capabilities and achieved similar results as clinicians. However, recent studies revealed that the evaluation in image segmentation studies lacks reliable model performance assessment and showed statistical bias by incorrect metric implementation or usage. Thus, this work provides a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
91
0
4

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 203 publications
(95 citation statements)
references
References 41 publications
0
91
0
4
Order By: Relevance
“…This equation is based on the following sub-metrics: IoU, recall, and precision. According to [ 56 ], these metrics are well suited for medical image segmentation, along with the F1 score or Dice similarity coefficient, which is slightly different from IoU because this one penalizes under- and over-segmentation more than the Dice does. The Dice coefficient is often used to quantify the performance of image segmentation methods.…”
Section: Evaluations and Resultsmentioning
confidence: 99%
“…This equation is based on the following sub-metrics: IoU, recall, and precision. According to [ 56 ], these metrics are well suited for medical image segmentation, along with the F1 score or Dice similarity coefficient, which is slightly different from IoU because this one penalizes under- and over-segmentation more than the Dice does. The Dice coefficient is often used to quantify the performance of image segmentation methods.…”
Section: Evaluations and Resultsmentioning
confidence: 99%
“…The results are obtained using the developed deep learning architecture on the PH2 database images [24,25] 5 and Table 1 shows the performance metrics used to show the effectiveness of the proposed system. The commonly used performance metrics [26,27], such as accuracy, sensitivity and specificity are employed in this study. To attain better performance, the proposed DermICNet for SCD requires more images, so data augmentation [28][29][30] is employed.…”
Section: Resultsmentioning
confidence: 99%
“…The trained model generates segmented binary masks for the test images, which are then compared to the ground truth binary mask for the test images. The proposed model’s performance is evaluated and compared to that of other segmentation models using the most commonly used evaluation matrices for semantic segmentation, including the dice similarity coefficient (DSC), intersection over union (IoU), accuracy, recall, precision, and specificity [ 41 , 42 ].…”
Section: Methodsmentioning
confidence: 99%