Structured AbstractObjectiveTo compare the performance of machine learning based, center-specific (MLCS) models and the US national registry-based, multicenter model (SART model) in predicting IVF live birth probabilities (LBPs) for 6 unrelated, geographically diverse US fertility centers.DesignRetrospective observational design.SubjectsTest sets comprised first IVF cycle data (2013-2022) extracted from a retrospective cohort of 4,645 patients at 6 fertility centers.Intervention or ExposureThe initial (MLCS1) and updated (MLCS2) models were compared against age control. MLSC2 and SART models were compared.Main Outcome MeasuresModel validation metrics, reported in median and interquartile range (IQR), were compared using Wilcoxon signed-rank test: ROC AUC, posterior log-likelihood of odds ratio compared to age (PLORA), Precision-Recall (PR) AUC, F1 score and continuous net reclassification improvement (NRI).ResultsMLCS1 and MLCS2 models showed improved AUC and PLORA compared to age control; MLCS1 models were validated using out-of-time test data. MLCS2 models showed improved PLORA 23.9 (IQR 10.2, 39.4) compared to 7.2 (IQR 3.6, 11.8) for MLCS1, p<0.05. MLCS2 showed higher median PR AUC at 0.75 (IQR 0.73, 0.77) compared to 0.69 (IQR 0.68, 0.71) for SART, p<0.05. In addition, the median F1 Score was higher for MLCS2 compared to SART model across predicted live birth probability (LBP) thresholds sampled at deciles at ≥40%, ≥50%, ≥60%, ≥70%. For example, at the 50% LBP threshold, MLCS2 had a median F1 score of 0.74 (IQR 0.72, 0.78) compared to 0.71 (IQR 0.68, 0.73) for SART.At these six centers, using the LBP threshold of ≥ 50%, MLCS2 models can identify ∼84% of patients who would go on to have IVF live births, while the SART model can only identify ∼75%. That means for every 100 patients who will have a first IVF cycle live birth, using LBR ≥ 50% as threshold, the MLCS2 model can identify 9 more such patients without overcalling or overestimating LBPs compared to the SART model.ConclusionMLCS models accurately assign higher IVF LBPs to more patients compared to the SART model at 6 US fertility centers. We recommend testing a larger sample of fertility centers to evaluate generalizability of MLCS model benefits.