PurposeTo establish the clinical applicability of deep‐learning organ‐at‐risk autocontouring models (DL‐AC) for brain radiotherapy. The dosimetric impact of contour editing, prior to model training, on performance was evaluated for both CT and MRI‐based models. The correlation between geometric and dosimetric measures was also investigated to establish whether dosimetric assessment is required for clinical validation.MethodCT and MRI‐based deep learning autosegmentation models were trained using edited and unedited clinical contours. Autosegmentations were dosimetrically compared to gold standard contours for a test cohort. D1%, D5%, D50%, and maximum dose were used as clinically relevant dosimetric measures. The statistical significance of dosimetric differences between the gold standard and autocontours was established using paired Student's t‐tests. Clinically significant cases were identified via dosimetric headroom to the OAR tolerance. Pearson's Correlations were used to investigate the relationship between geometric measures and absolute percentage dose changes for each autosegmentation model.ResultsExcept for the right orbit, when delineated using MRI models, the dosimetric statistical analysis revealed no superior model in terms of the dosimetric accuracy between the CT DL‐AC models or between the MRI DL‐AC for any investigated brain OARs. The number of patients where the clinical significance threshold was exceeded was higher for the optic chiasm D1% than other OARs, for all autosegmentation models. A weak correlation was consistently observed between the outcomes of dosimetric and geometric evaluations.ConclusionsEditing contours before training the DL‐AC model had no significant impact on dosimetry. The geometric test metrics were inadequate to estimate the impact of contour inaccuracies on dose. Accordingly, dosimetric analysis is needed to evaluate the clinical applicability of DL‐AC models in the brain.