2021
DOI: 10.21203/rs.3.rs-883606/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dataset Size Sensitivity Analysis of Machine Learning Classifiers to Differentiate Molecular Markers of Pediatric Low-Grade Gliomas Based on MRI

Abstract: Machine learning (ML) approaches can predict BRAF status of pediatric low-grade gliomas (pLGG) on pre-therapeutic brain MRI. The impact of training data sample size and type of ML model is not established. In this bi-institutional retrospective study, 251 pLGG FLAIR MRI datasets from 2 children’s hospitals were included. Radiomics features were extracted from tumor segmentations and five models (Random Forest, XGBoost, Neural Network (NN) 1 (100:20:2), NN2 (50:10:2), NN3 (50:20:10:2)) were tested to classify t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…When a ML algorithm is trained with a sufficiently large sample size, the dependence of the test accuracy on the test/train ratio should be small. 37 Figure 5e illustrates that the standard deviation in the test accuracy decreases monotonically from a sample size of 100−300 for all wavelengths in the ensemble model. This result indicates that while a sample size of 300 is optimal for this application, a sample size of 200 may be sufficient.…”
Section: ■ Results and Discussionmentioning
confidence: 97%
“…When a ML algorithm is trained with a sufficiently large sample size, the dependence of the test accuracy on the test/train ratio should be small. 37 Figure 5e illustrates that the standard deviation in the test accuracy decreases monotonically from a sample size of 100−300 for all wavelengths in the ensemble model. This result indicates that while a sample size of 300 is optimal for this application, a sample size of 200 may be sufficient.…”
Section: ■ Results and Discussionmentioning
confidence: 97%
“…One study published in preprint, used neural networks to classify BRAF-mutational status in a single institution, though the algorithm required manual segmentation (16). The sensitivity of the dataset size on BRAF mutation classification performance was studied by Wagner et al in a radiomics based study (39). They showed that Neural networks outperform XGBoost for classification AUC and that the performance was affected by the size of the data used in training.…”
Section: Discussionmentioning
confidence: 99%