2019
DOI: 10.2214/ajr.18.20224
|View full text |Cite|
|
Sign up to set email alerts
|

Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
140
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 290 publications
(141 citation statements)
references
References 20 publications
0
140
0
1
Order By: Relevance
“…21 A limitation of machine learning is, however, that algorithms may perform well in the sample they were trained on but rarely generalize to new data. 22,23 To address this, previous studies have applied within-sample cross-validation (CV), in which a given sample is iteratively divided into training and test data to ensure that model training and testing are conducted on different datasets. 18,24 While reducing the likelihood of overfitting, this approach leaves unaddressed the question whether the algorithm indeed generalizes to new and unseen data from independently recruited participants, 25 which is considered the gold standard of evaluating machine-learning performance.…”
Section: Introductionmentioning
confidence: 99%
“…21 A limitation of machine learning is, however, that algorithms may perform well in the sample they were trained on but rarely generalize to new data. 22,23 To address this, previous studies have applied within-sample cross-validation (CV), in which a given sample is iteratively divided into training and test data to ensure that model training and testing are conducted on different datasets. 18,24 While reducing the likelihood of overfitting, this approach leaves unaddressed the question whether the algorithm indeed generalizes to new and unseen data from independently recruited participants, 25 which is considered the gold standard of evaluating machine-learning performance.…”
Section: Introductionmentioning
confidence: 99%
“…It is generally agreed that the interpretation of machine learning models is non-trivial and often referred to as a "black box". 15 While there are tools in place to aid in the interpretation of some models, they do not apply to all of the models we trained. As an example, the Supplement Materials detail some basic interpretation information referred to as "feature importances", denoting which features are most influential in the models.…”
Section: The Model Results Inmentioning
confidence: 99%
“…The process included splitting the dataset into training and testing sets, cross-validation, hyperparameter tuning, and a final evaluation on the testing set. 14,15 We developed a set of clinical criteria required to pass a model, and developed a tie-breaking scheme when multiple models for a single dataset were acceptable. Subsequently, we performed a retrospective analysis on a collection of variants that had been orthogonally confirmed.…”
Section: Overviewmentioning
confidence: 99%
See 1 more Smart Citation
“…As a novel aside, the algorithm was compared to the accuracy of medically naive students trained on the same images who were able to achieve similar results, demonstrating the power of pattern recognition in radiodiagnosis, whether artificial or human. Along with a more extensively trained algorithm for hip fracture detection that was recently published by another Australian radiologist 3 as well as a number of other articles arising from medical imaging departments in Australia and New Zealand, 4,5 we can be confident that there are many technologically adept members of our profession that are capable of embracing this new technology and making it our own.…”
mentioning
confidence: 99%