Inconsistent Performance of Deep Learning Models on Mammogram Classification

Wang, Xiaoqin; Liang, Gongbo; Zhang, Yu; Blanton, Hunter; Bessinger, Zachary; Jacobs, Nathan

doi:10.1016/j.jacr.2020.01.006

Cited by 133 publications

(91 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Yousaf et al [ 26 ] used transfer learning to fine-tune nine CNN models pre-trained by ImageNet to achieve age-invariant face recognition. Wang et al [ 27 ] used CNN to classify benign and malignant mammograms, and proved that the use of transfer learning on similar data can help improve the performance of the model. And the research of Matthews et al [ 28 ] also proved this view.…”

Section: Introductionmentioning

confidence: 99%

New convolutional neural network model for screening and diagnosis of mammograms

et al. 2020

View full text Add to dashboard Cite

Breast cancer is the most common cancer in women and poses a great threat to women's life and health. Mammography is an effective method for the diagnosis of breast cancer, but the results are largely limited by the clinical experience of radiologists. Therefore, the main purpose of this study is to perform two-stage classification (Normal/Abnormal and Benign/ Malignancy) of two-view mammograms through convolutional neural network. In this study, we constructed a multi-view feature fusion network model for classification of mammograms from two views, and we proposed a multi-scale attention DenseNet as the backbone network for feature extraction. The model consists of two independent branches, which are used to extract the features of two mammograms from different views. Our work mainly focuses on the construction of multi-scale convolution module and attention module. The final experimental results show that the model has achieved good performance in both classification tasks. We used the DDSM database to evaluate the proposed method. The accuracy, sensitivity and AUC values of normal and abnormal mammograms classification were 94.92%, 96.52% and 94.72%, respectively. And the accuracy, sensitivity and AUC values of benign and malignant mammograms classification were 95.24%, 96.11% and 95.03%, respectively.

show abstract

Section: Introductionmentioning

confidence: 99%

New convolutional neural network model for screening and diagnosis of mammograms

et al. 2020

View full text Add to dashboard Cite

show abstract

“…AI-based algorithms are prone to behaving in unpredictable ways when applied in the real world. For example, algorithm performance may degrade when applied to images generated by equipment from a different manufacturer or in a different clinical environment than those of the training set [ 26 , 27 ]. Algorithm performance can degrade over time when original training characteristics change [ 28 ].…”

Section: Gaps In the Current Regulatory Frameworkmentioning

confidence: 99%

“…Algorithm performance tends to vary substantially from site to site in the real world [ 26 , 32 , 33 ]. This variability highlights the need for validation of algorithm performance at each clinical site before installation.…”

Section: Gaps In the Current Regulatory Frameworkmentioning

confidence: 99%

Regulatory Frameworks for Development and Evaluation of Artificial Intelligence–Based Diagnostic Imaging Algorithms: Summary and Recommendations

Larson

Harvey

Rubin

et al. 2021

Journal of the American College of Radiology

103

View full text Add to dashboard Cite

Although artificial intelligence (AI)-based algorithms for diagnosis hold promise for improving care, their safety and effectiveness must be ensured to facilitate wide adoption. Several recently proposed regulatory frameworks provide a solid foundation but do not address a number of issues that may prevent algorithms from being fully trusted. In this article, we review the major regulatory frameworks for software as a medical device applications, identify major gaps, and propose additional strategies to improve the development and evaluation of diagnostic AI algorithms. We identify the following major shortcomings of the current regulatory frameworks: (1) conflation of the diagnostic task with the diagnostic algorithm, (2) superficial treatment of the diagnostic task definition, (3) no mechanism to directly compare similar algorithms, (4) insufficient characterization of safety and performance elements, (5) lack of resources to assess performance at each installed site, and (6) inherent conflicts of interest. We recommend the following additional measures: (1) separate the diagnostic task from the algorithm, (2) define performance elements beyond accuracy, (3) divide the evaluation process into discrete steps, (4) encourage assessment by a third-party evaluator, (5) incorporate these elements into the manufacturers' development process. Specifically, we recommend four phases of development and evaluation, analogous to those that have been applied to pharmaceuticals and proposed for software applications, to help ensure world-class performance of all algorithms at all installed sites. In the coming years, we anticipate the emergence of a substantial body of research dedicated to ensuring the accuracy, reliability, and safety of the algorithms.

show abstract

“…In 2020, Wang et al published data on the performance of six deep learning models, three from the literature, and three developed by their team [2]. They tested the models variously on four different data sets, three publicly available case repositories and one from their own institution.…”

Section: The Challenge Of Developing Generalizable Ai Enabled Analytimentioning

confidence: 99%

“…A model created by Ribli et al [6] was trained on pooled DDSM data and data from a university hospital with a reported auROC of 0.95 on the INbreast validation data set. Wang's models were trained on an ImageNet [2,7] data set. The auROCs for these three models were 0.71, 0.75, and 0.79 when tested on the DDSM validation data set.…”

Section: The Challenge Of Developing Generalizable Ai Enabled Analytimentioning

confidence: 99%

Rethinking the Approach to Artificial Intelligence for Medical Image Analysis: The Case for Precision Diagnosis

Thrall

Fessell

Pandharipande

2021

Journal of the American College of Radiology

View full text Add to dashboard Cite

Inconsistent Performance of Deep Learning Models on Mammogram Classification

Cited by 133 publications

References 31 publications

New convolutional neural network model for screening and diagnosis of mammograms

New convolutional neural network model for screening and diagnosis of mammograms

Regulatory Frameworks for Development and Evaluation of Artificial Intelligence–Based Diagnostic Imaging Algorithms: Summary and Recommendations

Rethinking the Approach to Artificial Intelligence for Medical Image Analysis: The Case for Precision Diagnosis

Contact Info

Product

Resources

About