Objectives: To conduct a systematic review of studies externally validating the ADNEX model for ovarian cancer diagnosis and perform a meta-analysis of its performance. Design: Systematic review, meta-analysis Data sources: Medline, EMBASE, WOS, Scopus, and EuropePMC up to 15/05/2023. Review methods: We included external validation studies of the performance of ADNEX using any study design and any study population comprising patients with an adnexal mass. Two independent reviewers extracted data. Disagreements were resolved through discussion. Reporting quality of the studies was scored using the TRIPOD reporting guideline and methodological conduct and risk of bias using the PROBAST tool. We performed random effects meta-analysis of the AUC, sensitivity and specificity at the 10% risk of malignancy threshold, and Net Benefit and Relative Utility at the 10% risk of malignancy threshold. Results: We included 47 studies (17,007 tumours) with median study sample size 261 (range 24-4905). On average, 61% of TRIPOD items were reported. Handling of missing data, sample size justification, and model calibration were rarely described. 91% of validations were at high risk of bias, mainly due to the unexplained exclusion of incomplete cases, low sample size, or absent calibration assessment. The summary AUC to distinguish benign from malignant tumours in operated patients was 0.93 (95% CI 0.92-0.94, 95% prediction interval 0.85-0.98) for ADNEX with CA125 as a predictor (9202 tumours, 43 centres, 18 countries, 21 studies) and 0.93 (95% CI 0.91-0.94, 95% prediction interval 0.85-0.98) for ADNEX without CA125 (6309 tumours, 31 centres, 13 countries, 12 studies). The estimated probability that the model has clinical utility in a new centre was 95% (with CA125) and 91% (without CA125). When restricting analysis to studies at low risk of bias, summary AUCs were 0.93 (with CA125) and 0.91 (without CA125), and estimated probabilities that the model has clinical utility were 89% (with CA125) and 87% (without CA125). Discussion: ADNEX performed well to distinguish benign from malignant tumours in populations from different countries and settings regardless of whether CA125 was used or not. A key limitation is that calibration was rarely assessed. Review registration: PROSPERO, CRD42022373182