On Simulating Subjective Evaluation Using Combined Objective Metrics for Validation of 3D Tumor Segmentation

Deng, Xiang; Zhu, Lei; Sun, Yiyong; Xu, Chenyang; Song, Lan; Chen, Jiuhong; Merges, Reto D.; Jolly, Marie-Pierre; Suehling, Michael; Xing, Xu

doi:10.1007/978-3-540-75757-3_118

Cited by 11 publications

(7 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These aspects make the set of segmentations considered to be correct of great value in the research. They constitute the commonly named ''truest sets'' for comparisons of results, or what is called the ''ground truth'' (GT) [4,6].…”

Section: Figmentioning

confidence: 99%

A new measure for comparing biomedical regions of interest in segmentation of digital images

Conci

Galvao

Sequeiros

et al. 2015

Discrete Applied Mathematics

View full text Add to dashboard Cite

Section: Figmentioning

confidence: 99%

A new measure for comparing biomedical regions of interest in segmentation of digital images

Conci

Galvao

Sequeiros

et al. 2015

Discrete Applied Mathematics

View full text Add to dashboard Cite

“…Some authors have focused on the variability of segmentation results in the context of medical imaging and the analysis of segmentation algorithms with respect to multiple reference segmentations. [33][34][35][36][37] A method that combines several complementary quality measures into a single measure has been proposed by Deng et al 27 A combined measure that additionally considers the common variability of different users has been proposed in the context of the MICCAI segmentation challenge 2007 38 and the MICCAI liver tumor segmentation challenge 2008. 39 This measure is known as the MICCAI score.…”

Section: Related Workmentioning

confidence: 99%

“…Common static quality measures include volume-based metrics, like the volume overlap (Jaccard coefficient) and the Dice coefficient, as well as surface-based metrics, like the mean and maximum surface distance (Hausdorff distance), 26 and a combined measure known as the Medical Image Computing and Computer Assisted Intervention (MICCAI) score. 27 Reference segmentations are often given by manual delineations generated by domain experts, which are used as a surrogate for the unknown ground truth. 28 An objective quantitative evaluation of interactive segmentation algorithms or algorithms for segmentation editing is more challenging, though, because of their dynamic nature and because their quality also depends on the user's subjective impression and intention.…”

mentioning

confidence: 99%

On the evaluation of segmentation editing tools

et al. 2014

View full text Add to dashboard Cite

Abstract. Efficient segmentation editing tools are important components in the segmentation process, as no automatic methods exist that always generate sufficient results. Evaluating segmentation editing algorithms is challenging, because their quality depends on the user's subjective impression. So far, no established methods for an objective, comprehensive evaluation of such tools exist and, particularly, intermediate segmentation results are not taken into account. We discuss the evaluation of editing algorithms in the context of tumor segmentation in computed tomography. We propose a rating scheme to qualitatively measure the accuracy and efficiency of editing tools in user studies. In order to objectively summarize the overall quality, we propose two scores based on the subjective rating and the quantified segmentation quality over time. Finally, a simulation-based evaluation approach is discussed, which allows a more reproducible evaluation without the need for human input. This automated evaluation complements user studies, allowing a more convincing evaluation, particularly during development, where frequent user studies are not possible. The proposed methods have been used to evaluate two dedicated editing algorithms on 131 representative tumor segmentations. We show how the comparison of editing algorithms benefits from the proposed methods. Our results also show the correlation of the suggested quality score with the qualitative ratings.

show abstract

“…These metrics reflect the performance in terms of agreement [5] of a predicted segmentation compared V. Valindria, W. Bai to a reference 'ground truth' (GT) 1 . Commonly used metrics include Dice's similarity coefficient (DSC) [6] and other overlap based measures [7], but also metrics based on volume differences, surface distances, and others [8], [9], [10]. A detailed analysis of common metrics and their suitability for segmentation evaluation can be found in [11].…”

Section: Introductionmentioning

confidence: 99%

Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground Truth

Valindria

Lavdas

Bai

et al. 2017

IEEE Trans. Med. Imaging

122

100

View full text Add to dashboard Cite

When integrating computational tools, such as automatic segmentation, into clinical practice, it is of utmost importance to be able to assess the level of accuracy on new data and, in particular, to detect when an automatic method fails. However, this is difficult to achieve due to the absence of ground truth. Segmentation accuracy on clinical data might be different from what is found through cross validation, because validation data are often used during incremental method development, which can lead to overfitting and unrealistic performance expectations. Before deployment, performance is quantified using different metrics, for which the predicted segmentation is compared with a reference segmentation, often obtained manually by an expert. But little is known about the real performance after deployment when a reference is unavailable. In this paper, we introduce the concept of reverse classification accuracy (RCA) as a framework for predicting the performance of a segmentation method on new data. In RCA, we take the predicted segmentation from a new image to train a reverse classifier, which is evaluated on a set of reference images with available ground truth. The hypothesis is that if the predicted segmentation is of good quality, then the reverse classifier will perform well on at least some of the reference images. We validate our approach on multi-organ segmentation with different classifiers and segmentation methods. Our results indicate that it is indeed possible to predict the quality of individual segmentations, in the absence of ground truth. Thus, RCA is ideal for integration into automatic processing pipelines in clinical routine and as a part of large-scale image analysis studies.

show abstract

On Simulating Subjective Evaluation Using Combined Objective Metrics for Validation of 3D Tumor Segmentation

Cited by 11 publications

References 13 publications

A new measure for comparing biomedical regions of interest in segmentation of digital images

A new measure for comparing biomedical regions of interest in segmentation of digital images

On the evaluation of segmentation editing tools

Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground Truth

Contact Info

Product

Resources

About