Gold standards and expert panels: a pulmonary nodule case study with challenges and solutions

Miller, Dave P.; O’Shaughnessy, Kathryn F.; Wood, Susan A.; Castellino, Ronald A.

doi:10.1117/12.544716

Cited by 17 publications

(15 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The findings of these expert panels may be combined and permuted (in a logical and scientifically sound manner) to obtain a series of non-unique “truth” sets against which the performance of the system under consideration may vary substantially (11–13). …”

Section: Introductionmentioning

confidence: 99%

Assessment of Radiologist Performance in the Detection of Lung Nodules

et al. 2009

View full text Add to dashboard Cite

Rationale and Objectives-Studies that evaluate the lung-nodule-detection performance of radiologists or computerized methods depend on an initial inventory of the nodules within the thoracic images (the "truth"). The purpose of this study was to analyze (1) variability in the "truth" defined by different combinations of experienced thoracic radiologists and (2) variability in the performance of other experienced thoracic radiologists based on these definitions of "truth" in the context of lung nodule detection on computed tomography (CT) scans.Materials and Methods-Twenty-five thoracic CT scans were reviewed by four thoracic radiologists, who independently marked lesions they considered to be nodules ≥ 3 mm in maximum diameter. Panel "truth" sets of nodules then were derived from the nodules marked by different combinations of two and three of these four radiologists. The nodule-detection performance of the other radiologists was evaluated based on these panel "truth" sets.Results-The number of "true" nodules in the different panel "truth" sets ranged from 15-89 (mean: 49.8±25.6). The mean radiologist nodule-detection sensitivities across radiologists and panel "truth" sets for different panel "truth" conditions ranged from 51.0-83.2%; mean false-positive rates ranged from 0.33-1.39 per case.Conclusion-Substantial variability exists across radiologists in the task of lung nodule identification in CT scans. The definition of "truth" on which lung nodule detection studies are based

show abstract

Section: Introductionmentioning

confidence: 99%

Assessment of Radiologist Performance in the Detection of Lung Nodules

et al. 2009

View full text Add to dashboard Cite

show abstract

“…Surrogate end points reduce the cost of follow-up, reduce the challenge of low prevalence, or both. The shortcomings of surrogate end points are that they do not follow a patient to a clinical outcome, are subject to appreciable variability that needs to be addressed (106,107), and may be highly disease and/or technology specific. Several of the workshop speakers indicated that this topic should get more attention.…”

Section: Topics Deserving More Research and Attentionmentioning

confidence: 99%

Evaluating Imaging and Computer-aided Detection and Diagnosis Devices at the FDA

et al. 2012

View full text Add to dashboard Cite

This report summarizes the Joint FDA-MIPS Workshop on Methods for the Evaluation of Imaging and Computer-Assist Devices. The purpose of the workshop was to gather information on the current state of the science and facilitate consensus development on statistical methods and study designs for the evaluation of imaging devices to support US Food and Drug Administration submissions. Additionally, participants expected to identify gaps in knowledge and unmet needs that should be addressed in future research. This summary is intended to document the topics that were discussed at the meeting and disseminate the lessons that have been learned through past studies of imaging and computer-aided detection and diagnosis device performance.

show abstract

“…3,4 Picking a single favorable "truth" can also help superficially improve results. 5 In addition, there are algorithms that attempt to reconcile the differences between segmentations to get one "ground truth." 6,7 The problem with the algorithms is that once the "ground truth" is established, the differences in the segmentations from which the ground truth was derived are lost.…”

Section: Introductionmentioning

confidence: 99%

A shape-dependent variability metric for evaluating panel segmentations with a case study on LIDC

et al. 2010

View full text Add to dashboard Cite

The segmentation of medical images is challenging because a ground truth is often not available. Computer-Aided Detection (CAD) systems are dependent on ground truth as a means of comparison; however, in many cases the ground truth is derived from only experts' opinions. When the experts disagree, it becomes impossible to discern one ground truth. In this paper, we propose an algorithm to measure the disagreement among radiologist's delineated boundaries. The algorithm accounts for both the overlap and shape of the boundaries in determining the variability of a panel segmentation. After calculating the variability of 3788 thoracic computed tomography (CT) slices in the Lung Image Database Consortium (LIDC), we found that the radiologists have a high consensus in a majority of lung nodule segmentations. However, our algorithm identified a number of segmentations that the radiologists significantly disagreed on. Our proposed method of measuring disagreement can assist others in determining the reliability of panel segmentations. We also demonstrate that it is superior to simply using overlap, which is currently one of the most common ways of measuring segmentation agreement. The variability metric presented has applications to panel segmentations, and also has potential uses in CAD systems.

show abstract

Gold standards and expert panels: a pulmonary nodule case study with challenges and solutions

Cited by 17 publications

References 10 publications

Assessment of Radiologist Performance in the Detection of Lung Nodules

Assessment of Radiologist Performance in the Detection of Lung Nodules

Evaluating Imaging and Computer-aided Detection and Diagnosis Devices at the FDA

A shape-dependent variability metric for evaluating panel segmentations with a case study on LIDC

Contact Info

Product

Resources

About