2018
DOI: 10.1007/978-3-030-01364-6_13
|View full text |Cite
|
Sign up to set email alerts
|

Imperfect Segmentation Labels: How Much Do They Matter?

Abstract: Labeled datasets for semantic segmentation are imperfect, especially in medical imaging where borders are often subtle or ill-defined. Little work has been done to analyze the effect that label errors have on the performance of segmentation methodologies. Here we present a largescale study of model performance in the presence of varying types and degrees of error in training data. We trained U-Net, SegNet, and FCN32 several times for liver segmentation with 10 different modes of groundtruth perturbation. Our r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(13 citation statements)
references
References 20 publications
0
13
0
Order By: Relevance
“…This strategy is useful in mitigating the disadvantages of a trade-off between mask quality and big sample size, as encountered when using automatically generated training data ( McClure et al, 2019 ). This is further reinforced by the robust performance of the trained CNN here, demonstrating that the approximate masks despite individual flaws allow for a successful capture of the structure’s properties by the neural networks ( Heller et al, 2018 ).…”
Section: Discussionmentioning
confidence: 54%
See 1 more Smart Citation
“…This strategy is useful in mitigating the disadvantages of a trade-off between mask quality and big sample size, as encountered when using automatically generated training data ( McClure et al, 2019 ). This is further reinforced by the robust performance of the trained CNN here, demonstrating that the approximate masks despite individual flaws allow for a successful capture of the structure’s properties by the neural networks ( Heller et al, 2018 ).…”
Section: Discussionmentioning
confidence: 54%
“…Although the 3D version involves a higher computational load which may limit the upper resolution of processed images, the inclusion of additional dimension was shown to be of significant benefit to the segmentation ( Chen et al, 2019 ; Mlynarski et al, 2020 ). Another reason for using the U-Net architecture was its reported robustness to jagged boundary-localized errors ( Heller et al, 2018 ), which is a helpful feature in case of training on automatically generated masks.…”
Section: Methodsmentioning
confidence: 99%
“…Moreover, we acknowledge that manually correcting brain masks in a single case can take hours (Puccio et al, ). Although our approach with generating a GT brain mask in a large‐scale dataset was more focused on correcting major errors (e.g., around pathologies, resection cavities or due to varying hardware or acquisition parameters), even imperfect GT labels can lead to high quality deep‐learning segmentation algorithms when using the UNET architecture that was employed in our study (Heller, Dean, & Papanikolopoulos, ). Moreover, the competitiveness of our approach was rendered by testing on the public datasets (NFBS, CC‐359, and LPBA40) where we confirmed the performance of the HD‐BET algorithm against an independent high‐quality GT.…”
Section: Discussionmentioning
confidence: 99%
“…Although deep-learning is relatively robust to label noise, 48 perturbations along boundaries are particularly problematic for U-Net model training. 49 In prior studies, correcting contours, as opposed to contouring from scratch has been shown to increase consistency and reduce inter-observer variation. 50 The two outliers with low dice scores were in patients with anatomical abnormalities that will also likely be correlated to high dose to the heart (Figure 7) and in one case would have resulted in an error in dose quantification of over 6 Gray had it not been corrected (Figure 10).…”
Section: Discussionmentioning
confidence: 99%