Purpose
The goal of this study was to generate a large treatment plan database for head and neck (H&N) cancer patients that can be considered as the gold standard to train and validate models for knowledge‐based (KB) treatment planning and QA. With this dataset, the intrinsic prediction performance, the effect of interorgan dependency, and the impact of dataset inconsistency was investigated for an existing treatment planning QA model.
Methods
The CT scans of 108 previously treated oropharyngeal patients were used to establish the plan database. For each patient, 15 Pareto optimal treatment plans with different planning priorities for the parotid glands were generated with fully automatic multicriterial treatment planning (1620 plans in total). For each of the 15 sets of plans in the database, a KB model was trained with 54 patients and validated on the other 54 by comparing the predictions with the achieved doses. The dose prediction accuracy (predicted—achieved) of the KB models was assessed and compared among the different models to characterize the intrinsic performance and effect of interorgan dependency. In addition, the effect of dataset inconsistency with respect to planning prioritizations was investigated by mixing plans with different prioritizations, for the training, the validation dataset, and for both combined.
Results
In the case of a high planning priority, the mean ± SD of the prediction error for the mean dose of the parotid glands was only 0.2 ± 2.2 Gy, but this increased to 1.0 ± 5.0 Gy in the case that the parotid glands had a low planning priority. Dataset inconsistency (in planning priority) led to a large increase in prediction error for the parotid glands (mean ± SD) from 0.2 ± 2.2 Gy to 2.8 ± 3.3 Gy, −3.2 ± 5.0 Gy or −0.6 ± 5.4 Gy, depending on the way the datasets were mixed.
Conclusions
The generated plan database can be used to validate and characterize KB prediction models for H&N cancer and will be made available upon request. The investigated KB model performed well in case the parotid glands had a high planning priority (little dependence on lower priority OARs), but poorly for organs for which the dose strongly depends on other higher priority OARs. To improve the performance of KB prediction models for H&N cancer, interorgan dependency should be modeled and accounted for. Dataset inconsistency has a large negative impact on the prediction errors of KB models and should be avoided as much as possible.