2019
DOI: 10.1117/1.jmi.6.1.015501
|View full text |Cite
|
Sign up to set email alerts
|

Impact of prevalence and case distribution in lab-based diagnostic imaging studies

Abstract: We investigated effects of prevalence and case distribution on radiologist diagnostic performance as measured by area under the receiver operating characteristic curve (AUC) and sensitivity-specificity in lab-based reader studies evaluating imaging devices. Our retrospective reader studies compared full-field digital mammography (FFDM) to screen-film mammography (SFM) for women with dense breasts. Mammograms were acquired from the prospective Digital Mammographic Imaging Screening Trial. We performed five read… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 27 publications
0
11
0
Order By: Relevance
“…M. Wolfe, Horowitz, & Kenner, 2005), and one might expect our participants to have missed hazards more frequently had they been rare. However, prevalence effects in the laboratory can be far weaker in more critical tasks, such as asking radiologists to look for abnormalities in medical images (Gallas et al, 2019; Gur et al, 2003). For that matter, expertise is a significant factor in hazard detection (Underwood, Ngai, & Underwood, 2013), which may aid hazard detection and evasion planning.…”
Section: Discussionmentioning
confidence: 99%
“…M. Wolfe, Horowitz, & Kenner, 2005), and one might expect our participants to have missed hazards more frequently had they been rare. However, prevalence effects in the laboratory can be far weaker in more critical tasks, such as asking radiologists to look for abnormalities in medical images (Gallas et al, 2019; Gur et al, 2003). For that matter, expertise is a significant factor in hazard detection (Underwood, Ngai, & Underwood, 2013), which may aid hazard detection and evasion planning.…”
Section: Discussionmentioning
confidence: 99%
“…If the case raises no suspicion, then the reader will swipe vertically across the left-most region of the bar, labelled “normal,” while certainty of cancer present would result in a vertical swipe across the right-most region, labelled “malignant.” The center of the bar is labelled “uncertain.” The horizontal location of the swipe is encoded as a quasi-continuous score ranging from −100 ( normal ) to +100 ( malignant ), with zero in the center ( uncertain ), resulting in 201 possible PoM scores. This large rating scale was patterned after prior work [ 9 ]. The radiologists were instructed to swipe to the left of center if they did not recall the case, and to the right of center if they did.…”
Section: Methodsmentioning
confidence: 99%
“…Metrics for explainable AI assessment through human observation should be evaluated as in a multi‐reader multi‐case (MRMC) approach 76,77 . The use of multiple readers and a large set of diverse cases is important, as observer variability can strongly impact the actual performance of computer‐aided (AI‐aided) systems 78–80 . With that in mind, MRMC studies that utilize receiver operating characterisitic (ROC) and precision‐recall analyses with proper statistical characterization of variance are ideal for determining if an explanation benefits clinical performance.…”
Section: Evaluation Of Explainability and Interpretabilitymentioning
confidence: 99%
“…76,77 The use of multiple readers and a large set of diverse cases is important, as observer variability can strongly impact the actual performance of computer-aided (AI-aided) systems. [78][79][80] With that in mind, MRMC studies that utilize receiver operating characterisitic (ROC) and precision-recall analyses with proper statistical characterization of variance are ideal for determining if an explanation benefits clinical performance. Other metrics include Cohen's κ, F1 score, and changes to accuracy and response time.…”
Section: Human-based Explainable Ai Evaluationmentioning
confidence: 99%