2020
DOI: 10.1001/jamaoncol.2020.3321
|View full text |Cite
|
Sign up to set email alerts
|

External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms

Abstract: IMPORTANCE A computer algorithm that performs at or above the level of radiologists in mammography screening assessment could improve the effectiveness of breast cancer screening.OBJECTIVE To perform an external evaluation of 3 commercially available artificial intelligence (AI) computer-aided detection algorithms as independent mammography readers and to assess the screening performance when combined with radiologists. DESIGN, SETTING, AND PARTICIPANTSThis retrospective case-control study was based on a doubl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
153
1
3

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 187 publications
(161 citation statements)
references
References 29 publications
4
153
1
3
Order By: Relevance
“… 3 Three commercially available AI algorithms have recently been subject to an external, independent evaluation in a case–control setting. 7 One of the studied algorithms showed a significantly higher performance compared with the other two algorithms, highlighting the importance of construction and training data for external validity. Interestingly, combining the best performing AI with the first reader/radiologist was more successful than combining the first and second readers regarding cancer detection rate, however, at the cost of increasing false-positive cases.…”
Section: Discussionmentioning
confidence: 90%
See 1 more Smart Citation
“… 3 Three commercially available AI algorithms have recently been subject to an external, independent evaluation in a case–control setting. 7 One of the studied algorithms showed a significantly higher performance compared with the other two algorithms, highlighting the importance of construction and training data for external validity. Interestingly, combining the best performing AI with the first reader/radiologist was more successful than combining the first and second readers regarding cancer detection rate, however, at the cost of increasing false-positive cases.…”
Section: Discussionmentioning
confidence: 90%
“…Interestingly, combining the best performing AI with the first reader/radiologist was more successful than combining the first and second readers regarding cancer detection rate, however, at the cost of increasing false-positive cases. 7 Another study, also evaluating a different AI program than Transpara, illustrates the complexity both in calibrating and in evaluating AI algorithms based on retrospective data. 8 For instance, reasonable cut-off levels must be chosen for the AI assessments (that are provided on a continuous scale) to enable comparison with the human assessments (that are provided on a binary scale: negative/suspected malignancy).…”
Section: Discussionmentioning
confidence: 99%
“…Since the Kasparov’s Law (KL), as mentioned in the Introduction, can be framed as two different statements, we designed two different experiments: In order to evaluate the first KL statement, we used the InceptionV3 model as the “Strong” machine (which was actually more accurate than the best human reader involved in this study), and we then used the MobileNet model as the Machine (second reader) to be used in the “better” process of human-AI collaboration. Specifically, the Machine was always recruited as the second observer, like done in [ 39 ], where it was found that combining the first reader with the best algorithm identified more pathological cases than having a human as the second reader. In doing so, we considered a total of 90 3-reader 2-permutations.…”
Section: Methodsmentioning
confidence: 99%
“…Recent works suggest that the current generation of AI-based algorithms may interpret mammograms at least at the level of human readers (19)(20)(21)(22)(23). Evaluations were based on small-scale reader studies (19)(20)(21) and larger scale retrospective evaluations (21)(22)(23) performed on artificially enriched sets, often involving resampling in an attempt to approximate a more representative screening population, and with dataset images significantly skewed towards a single mammography hardware vendor in each case. AI and its true potential to positively transform clinical practice on real-world screening populations remains to be confirmed.…”
Section: Introductionmentioning
confidence: 99%