Despite the recent improvements in overall accuracy, deep learning systems still exhibit low levels of robustness. Detecting possible failures is critical for a successful clinical integration of these systems, where each data point corresponds to an individual patient. Uncertainty measures are a promising direction to improve failure detection since they provide a measure of a system's confidence. Although many uncertainty estimation methods have been proposed for deep learning, little is known on their benefits and current challenges for medical image segmentation. Therefore, we report results of evaluating common voxel-wise uncertainty measures with respect to their reliability, and limitations on two medical image segmentation datasets. Results show that current uncertainty methods perform similarly and although they are well-calibrated at the dataset level, they tend to be miscalibrated at subject-level. Therefore, the reliability of uncertainty estimates is compromised, highlighting the importance of developing subject-wise uncertainty estimations. Additionally, among the benchmarked methods, we found auxiliary networks to be a valid alternative to common uncertainty methods since they can be applied to any previously trained segmentation model.
Uncertainty estimation methods are expected to improve the understanding and quality of computer-assisted methods used in medical applications (e.g., neurosurgical interventions, radiotherapy planning), where automated medical image segmentation is crucial. In supervised machine learning, a common practice to generate ground truth label data is to merge observer annotations. However, as many medical image tasks show a high inter-observer variability resulting from factors such as image quality, different levels of user expertise and domain knowledge, little is known as to how inter-observer variability and commonly used fusion methods affect the estimation of uncertainty of automated image segmentation. In this paper we analyze the effect of common image label fusion techniques on uncertainty estimation, and propose to learn the uncertainty among observers. The results highlight the negative effect of fusion methods applied in deep learning, to obtain reliable estimates of segmentation uncertainty. Additionally, we show that the learned observers' uncertainty can be combined with current standard Monte Carlo dropout Bayesian neural networks to characterize uncertainty of model's parameters.
Background: Automated brain tumor segmentation methods are computational algorithms that yield tumor delineation from, in this case, multimodal magnetic resonance imaging (MRI). We present an automated segmentation method and its results for resection cavity (RC) in glioblastoma multiforme (GBM) patients using deep learning (DL) technologies. Methods: Post-operative, T1w with and without contrast, T2w and fluid attenuated inversion recovery MRI studies of 30 GBM patients were included. Three radiation oncologists manually delineated the RC to obtain a reference segmentation. We developed a DL cavity segmentation method, which utilizes all four MRI sequences and the reference segmentation to learn to perform RC delineations. We evaluated the segmentation method in terms of Dice coefficient (DC) and estimated volume measurements. Results: Median DC of the three radiation oncologist were 0.85 (interquartile range [IQR]: 0.08), 0.84 (IQR: 0.07), and 0.86 (IQR: 0.07). The results of the automatic segmentation compared to the three different raters were 0.83 (IQR: 0.14), 0.81 (IQR: 0.12), and 0.81 (IQR: 0.13) which was significantly lower compared to the DC among raters (chisquare = 11.63, p = 0.04). We did not detect a statistically significant difference of the measured RC volumes for the different raters and the automated method (Kruskal-Wallis test: chi-square = 1.46, p = 0.69). The main sources of error were due to signal inhomogeneity and similar intensity patterns between cavity and brain tissues. Conclusions: The proposed DL approach yields promising results for automated RC segmentation in this proof of concept study. Compared to human experts, the DC are still subpar.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.