Purpose
Task automation is essential for efficient and consistent image segmentation in radiation oncology. We report on a deep learning architecture, comprising a U‐Net and a variational autoencoder (VAE) for automatic contouring of the prostate gland incorporating interobserver variation for radiotherapy treatment planning. The U‐Net/VAE generates an ensemble set of segmentations for each image CT slice. A novel outlier mitigation (OM) technique was implemented to enhance the model segmentation accuracy.
Methods
The primary source dataset (source_prim) consisted of 19 200 CT slices (from 300 patient planning CT image datasets) with manually contoured prostate glands. A smaller secondary source dataset (source_sec) comprised 640 CT slices (from 10 patient CT datasets), where prostate glands were segmented by 5 independent physicians on each dataset to account for interobserver variability. Data augmentation via random rotation (<5 degrees), cropping, and horizontal flipping was applied to each dataset to increase sample size by a factor of 100. A probabilistic hierarchical U‐Net with VAE was implemented and pretrained using the augmented source_prim dataset for 30 epochs. Model parameters of the U‐Net/VAE were fine‐tuned using the augmented source_sec dataset for 100 epochs. After the first round of training, outlier contours in the training dataset were automatically detected and replaced by the most accurate contours (based on Dice similarity coefficient, DSC) generated by the model. The U‐Net/OM‐VAE was retrained using the revised training dataset. Metrics for comparison included DSC, Hausdorff distance (HD, mm), normalized cross‐correlation (NCC) coefficient, and center‐of‐mass (COM) distance (mm).
Results
Results for U‐Net/OM‐VAE with outliers replaced in the training dataset versus U‐Net/VAE without OM were as follows: DSC = 0.82 ± 0.01 versus 0.80 ± 0.02 (p = 0.019), HD = 9.18 ± 1.22 versus 10.18 ± 1.35 mm (p = 0.043), NCC = 0.59 ± 0.07 versus 0.62 ± 0.06, and COM = 3.36 ± 0.81 versus 4.77 ± 0.96 mm over the average of 15 contours. For the average of 15 highest accuracy contours, values were as follows: DSC = 0.90 ± 0.02 versus 0.85 ± 0.02, HD = 5.47 ± 0.02 versus 7.54 ± 1.36 mm, and COM = 1.03 ± 0.58 versus 1.46 ± 0.68 mm (p < 0.03 for all metrics). Results for the U‐Net/OM‐VAE with outliers removed were as follows: DSC = 0.78 ± 0.01, HD = 10.65 ± 1.95 mm, NCC = 0.46 ± 0.10, COM = 4.17 ± 0.79 mm for the average of 15 contours, and DSC = 0.88 ± 0.02, HD = 7.00 ± 1.17 mm, COM = 1.58 ± 0.63 mm for the average of 15 highest accuracy contours. All metrics for U‐Net/VAE trained on the source_prim and source_sec datasets via pretraining, followed by fine‐tuning, show statistically significant improvement over that trained on the source_sec dataset only. Finally, all metrics for U‐Net/VAE with or without OM showed statistically significant improvement over those for the standard U‐Net.
Conclusions
A VAE combined with a hierarchical U‐Net and an OM strategy (U‐Net/OM‐VAE) demonstrates promise toward capt...