ObjectiveAutomatic segmentation of vestibular schwannoma (VS) from routine clinical MRI can improve clinical workflow, facilitate treatment decisions, and assist patient management. Previously, excellent automatic segmentation results were achieved on datasets of standardised MRI images acquired for stereotactic surgery planning. However, diagnostic clinical datasets are generally more diverse and pose a larger challenge to automatic segmentation algorithms. Here, we show that automatic segmentation of VS on such datasets is also possible with high accuracy.MethodsWe acquired a large multi-centre routine clinical (MC-RC) dataset of 168 patients with a single sporadic VS who were referred from 10 medical sites and consecutively seen at a single centre. Up to three longitudinal MRI exams were selected for each patient. Selection rules based on image modality, resolution orientation, and acquisition timepoint were defined to automatically select contrast-enhanced T1-weighted (ceT1w) images (n=130) and T2-weighted images (n=379). Manual ground truth segmentations were obtained in an iterative process in which segmentations were: 1) produced or amended by a specialized company; and 2) reviewed by one of three trained radiologists; and 3) validated by an expert team. Inter- and intra-observer reliability was assessed on a subset of 10 ceT1w and 41 T2w images. The MC-RC dataset was split randomly into 3 nonoverlapping sets for model training, hyperparameter-tuning and testing in proportions 70/10/20%. We applied deep learning to train our VS segmentation model, based on convolutional neural networks (CNN) within the nnU-Net framework.ResultsOur model achieved excellent Dice scores when evaluated on the MC-RC testing set as well as the public testing set. On the MC-RC testing set, Dice scores were 90.8±4.5% for ceT1w, 86.1±11.6% for T2w and 82.3±18.4% for a combined ceT1w+T2w input.ConclusionsWe developed a model for automatic VS segmentation on diverse multi-centre clinical datasets. The results show that the performance of the framework is comparable to that of human annotators. In contrast, a model trained a publicly available dataset acquired for Gamma Knife stereotactic radiosurgery did not perform well on the MC-RC testing set. The application of our model has the potential to greatly facilitate the management of patients in clinical practice. Our pre-trained segmentation models are made available online. Moreover, we are in the process of making the MC-RC dataset publicly available.