Background: Deep learning (DL) models for auto-segmentation in radiotherapy have been extensively studied in retrospective and pilot settings. However, these studies might not reflect the clinical setting. This study compares the use of a clinically implemented in-house trained DL segmentation model for breast cancer to a previously performed pilot study to assess possible differences in performance or acceptability.
Material and methods: Sixty patients with whole breast radiotherapy, with or without an indication for locoregional radiotherapy were included. Structures were qualitatively scored by radiotherapy technologists and radiation oncologists. Quantitative evaluation was performed using dice-similarity coefficient (DSC), 95th percentile of Hausdorff Distance (95%HD) and surface DSC (sDSC), and time needed for generating, checking, and correcting structures was measured.
Results: Ninety-three percent of all contours in clinic were scored as clinically acceptable or usable as a starting point, comparable to 92% achieved in the pilot study. Compared to the pilot study, no significant changes in time reduction were achieved for organs at risks (OARs). For target volumes, significantly more time was needed compared to the pilot study for patients including lymph node levels 1–4, although time reduction was still 33% compared to manual segmentation. Almost all contours have better DSC and 95%HD than inter-observer variations. Only CTVn4 scored worse for both metrics, and the thyroid had a higher 95%HD value.
Interpretation: The use of the DL model in clinical practice is comparable to the pilot study, showing high acceptability rates and time reduction.