2008
DOI: 10.1118/1.2977766
|View full text |Cite
|
Sign up to set email alerts
|

Binary and multi‐category ratings in a laboratory observer performance study: A comparison

Abstract: We investigated radiologists performances during retrospective interpretation of screening mammograms when using a binary decision whether to recall a woman for additional procedures or not and compared it with their receiver operating characteristic (ROC) type performance curves using a semi-continuous rating scale. Under an Institutional Review Board approved protocol nine experienced radiologists independently rated an enriched set of 155 examinations that they had not personally read in the clinic, mixed w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
4
0

Year Published

2008
2008
2013
2013

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 24 publications
1
4
0
Order By: Relevance
“…A bootstrap analysis was carried out to ascertain this. This analysis was similar to what was described by Gur et al [9] in their work comparing mammography reader performances under binary and continuous/multicategory ratings. We summarize the key steps: Let us suppose that the ith reader is denoted by R i , and that for this reader and a given viewing mode (stereo or mono), the binary TPF and To assess whether the signed difference in sensitivities was significant or not, we performed bootstrap sampling.…”
Section: Discussionsupporting
confidence: 82%
See 1 more Smart Citation
“…A bootstrap analysis was carried out to ascertain this. This analysis was similar to what was described by Gur et al [9] in their work comparing mammography reader performances under binary and continuous/multicategory ratings. We summarize the key steps: Let us suppose that the ith reader is denoted by R i , and that for this reader and a given viewing mode (stereo or mono), the binary TPF and To assess whether the signed difference in sensitivities was significant or not, we performed bootstrap sampling.…”
Section: Discussionsupporting
confidence: 82%
“…The confidence scores, which indicated the reader's confidence in the presence of an abnormality, were collected for analyzing the readers' performance in an experimental setting using the receiver operating characteristic curve (ROC). Previous works [9] on analyzing binary and continuous/multi-category ratings in mammography observer studies have demonstrated that these two decision-making processes yield similar reader performances even though the binary true positive fraction (TPF) and false-positive fraction (FPF) operating points do not always lie on the ROC curves but in their vicinity.…”
Section: Methodsmentioning
confidence: 99%
“…In a prior paper [30] we demonstrated an absence of systematic differences between the radiologist-specific Binary “operating point” and his/her ROC curve. That seemingly contradicting finding to the results of this paper is actually not a contradiction, rather a natural consequence of the differences in the performance measures used under the different paradigms.…”
Section: Discussionmentioning
confidence: 69%
“…In our previous paper [30] we experimentally demonstrated that reader-specific performances under the Binary paradigm do not exhibit systematic fluctuations from the ROC curves. That result experimentally supports the theoretical relationship often presumed to exist, at least on average, between the ROC curve and the operating point (sensitivity, specificity) estimated under the ROC and Binary paradigms.…”
Section: Introductionmentioning
confidence: 99%
“…Even though we found a considerable consistency among the performance levels of the readers regardless of the specific method or rating scale used (i.e., binary, receiver operating characteristic (ROC) or free-response ROC (FROC)) (2,3), our study showed that radiologists may perform significantly different in the clinic than in the laboratory when reading the very same cases. The differences were reflected in terms of both their overall performance levels (e.g.…”
mentioning
confidence: 55%