ObjectivesTo validate externally the performance of the Assessment of Different NEoplasias in the adneXa (ADNEX) model and compare this model with other frequently used models in the differentiation between benign and malignant adnexal masses.MethodsIn this retrospective diagnostic accuracy study, we assessed data collected prospectively from patients with adnexal pathology who underwent real‐time transvaginal or transrectal ultrasound by a single expert ultrasonographer in a tertiary care hospital between July 2011 and July 2015. The presence of a malignancy was determined by subjective assessment and use of four prediction models: the ADNEX model, simple ultrasound‐based rules (simple rules), Logistic Regression model 2 (LR2) and the Risk of Malignancy Index (RMI), of which three different variants were assessed. Pathology was the clinical reference standard.ResultsIn total, 851 consecutive patients underwent ultrasound examination for an adnexal mass. For 326 patients (128 premenopausal and 198 postmenopausal), pathology results were available (211 (64.7%) benign; 115 (35.3%) malignant) and these were included in the analysis. The area under the receiver–operating characteristics curve (AUC) of the ADNEX model for the discrimination between benign and malignant tumors was 0.93 (95% CI, 0.89–0.95). AUCs for the subtypes of malignancy (i.e. borderline, Stage I–IV and metastatic adnexal tumors) ranged between 0.60 and 0.90. Only subjective assessment (AUC, 0.96 (95% CI, 0.93–0.98)) was superior to the ADNEX model (P = 0.01) in differentiating malignant from benign tumors. AUCs for the other models were 0.92 (95% CI, 0.89–0.95) for LR2, 0.85 (95% CI, 0.81–0.89) for RMI‐I, 0.82 (95% CI, 0.77–0.86) for RMI‐II and 0.84 (95% CI, 0.80–0.88) for RMI‐III. At the proposed cut‐off of ≥ 10%, the ADNEX model had the highest sensitivity (0.98 (95% CI, 0.93–1.00)) but the lowest specificity (0.62 (95% CI, 0.55–0.68)) compared with the other models. Both subjective assessment (sensitivity, 0.90 (95% CI, 0.83–0.95); specificity 0.91 (95% CI, 0.86–0.94)) and the simple rules model with inconclusive cases classified by subjective assessment (sensitivity, 0.89 (95% CI, 0.81–0.94); specificity, 0.90 (95% CI, 0.85–0.94)) had lower sensitivity, but their sensitivity and specificity were better balanced.ConclusionsAlthough the test performance of subjective assessment by an expert remains superior, the ADNEX model can help in the differentiation between benign and malignant ovarian tumors. The advantage of the ADNEX model as a polytomous model remains to be shown. © 2016 The Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of the International Society of Ultrasound in Obstetrics and Gynecology.