2020
DOI: 10.1016/j.jacr.2020.01.006
|View full text |Cite
|
Sign up to set email alerts
|

Inconsistent Performance of Deep Learning Models on Mammogram Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
87
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 133 publications
(91 citation statements)
references
References 31 publications
2
87
0
2
Order By: Relevance
“…Yousaf et al [ 26 ] used transfer learning to fine-tune nine CNN models pre-trained by ImageNet to achieve age-invariant face recognition. Wang et al [ 27 ] used CNN to classify benign and malignant mammograms, and proved that the use of transfer learning on similar data can help improve the performance of the model. And the research of Matthews et al [ 28 ] also proved this view.…”
Section: Introductionmentioning
confidence: 99%
“…Yousaf et al [ 26 ] used transfer learning to fine-tune nine CNN models pre-trained by ImageNet to achieve age-invariant face recognition. Wang et al [ 27 ] used CNN to classify benign and malignant mammograms, and proved that the use of transfer learning on similar data can help improve the performance of the model. And the research of Matthews et al [ 28 ] also proved this view.…”
Section: Introductionmentioning
confidence: 99%
“…AI-based algorithms are prone to behaving in unpredictable ways when applied in the real world. For example, algorithm performance may degrade when applied to images generated by equipment from a different manufacturer or in a different clinical environment than those of the training set [ 26 , 27 ]. Algorithm performance can degrade over time when original training characteristics change [ 28 ].…”
Section: Gaps In the Current Regulatory Frameworkmentioning
confidence: 99%
“…Algorithm performance tends to vary substantially from site to site in the real world [ 26 , 32 , 33 ]. This variability highlights the need for validation of algorithm performance at each clinical site before installation.…”
Section: Gaps In the Current Regulatory Frameworkmentioning
confidence: 99%
“…In 2020, Wang et al published data on the performance of six deep learning models, three from the literature, and three developed by their team [2]. They tested the models variously on four different data sets, three publicly available case repositories and one from their own institution.…”
Section: The Challenge Of Developing Generalizable Ai Enabled Analytimentioning
confidence: 99%
“…A model created by Ribli et al [6] was trained on pooled DDSM data and data from a university hospital with a reported auROC of 0.95 on the INbreast validation data set. Wang's models were trained on an ImageNet [2,7] data set. The auROCs for these three models were 0.71, 0.75, and 0.79 when tested on the DDSM validation data set.…”
Section: The Challenge Of Developing Generalizable Ai Enabled Analytimentioning
confidence: 99%