BackgroundTo compare the performance of an AI model based on strategies designed to overcome small sized development sets to pediatric ER physicians at a classification triage task of pediatric elbow radiographs. Methods1,314 pediatric elbow lateral radiographs (mean age: 8.2 years) were retrospectively retrieved, binomially classified based on their annotation as normal or abnormal (with pathology), and randomly partitioned into a development set (993 images), tuning set (109 images), second tuning set (100 images) and test set (112 images). The AI model was trained on the development set and utilized the EfficientNet B1 compound scaling network architecture and online augmentations. Its performance on the test set was compared to a group of five physicians (inter-rater agreement: fair). Statistical analysis: AUC of AI model - DeLong method. Performance of AI model and physician groups - McNemar test. ResultsAccuracy of the model on the test set - 0.804 (95% CI, 0.718 - 0.873), AUROC - 0.872 (95% CI, 0.831 - 0.947). AI model performance compared to the physician group on the test set - sensitivity 0.790 (95% CI 0.684 to 0.895) vs 0.649 (95% CI 0.525 to 0.773), p value 0.088; specificity 0.818 (95% CI 0.716 to 0.920) vs 0.873 (95% CI 0.785 to 0.961), p value 0.439.ConclusionsThe AI model for elbow radiograph triage designed with strategies to optimize performance for a small sized development set showed comparable performance to physicians.