Purpose
The current process for radiotherapy treatment plan quality assurance relies on human inspection of treatment plans, which is time‐consuming, error prone and oft reliant on inconsistently applied professional judgments. A previous proof‐of‐principle paper describes the use of a Bayesian network (BN) to aid in this process. This work studied how such a BN could be expanded and trained to better represent clinical practice.
Methods
We obtained 51 540 unique radiotherapy cases including diagnostic, prescription, plan/beam, and therapy setup factors from a de‐identified Elekta oncology information system from the years 2010–2017 from a single institution. Using a knowledge base derived from clinical experience, factors were coordinated into a 29‐node, 40‐edge BN representing dependencies among the variables. Conditional probabilities were machine learned using expectation maximization module using all data except a subset of 500 patient cases withheld for testing. Different classes of errors that were obtained from incident learning systems were introduced to the testing set of cases which were withheld from the dataset used for building the BN. Different sizes of datasets were used to train the network. In addition, the BN was trained using data from different length epochs as well as different eras. Its performance under these different conditions was evaluated by means of Areas Under the receiver operating characteristic Curve (AUC).
Results
Our performance analysis found AUCs of 0.82, 0.85, 0.89, and 0.88 in networks trained with 2‐yr, 3‐yr 4‐yr and 5‐yr windows. With a 4‐yr sliding window, we found AUC reduction of 3% per year when moving the window back in time in 1‐yr steps. Compared to the 4‐yr window moved back by 4 yrs (2010–2013 vs 2014–2017), the largest component of overall reduction in AUC over time was from the loss of detection performance in plan/beam error types.
Conclusions
The expanded BN method demonstrates the ability to detect classes of errors commonly encountered in radiotherapy planning. The results suggest that a 4‐yr training dataset optimizes the performance of the network in this institutional dataset, and that yearly updates are sufficient to capture the evolution of clinical practice and maintain fidelity.