Background
A clinical study regarding the potential of range verification in proton therapy (PT) by prompt gamma imaging (PGI) is carried out at our institution. Manual interpretation of the detected spot‐wise range shift information is time‐consuming, highly complex, and therefore not feasible in a broad routine application.
Purpose
Here, we present an approach to automatically detect and classify treatment deviations in realistically simulated PGI data for head‐and‐neck cancer (HNC) treatments using convolutional neural networks (CNNs) and conventional machine learning (ML) approaches.
Methods
For 12 HNC patients and 1 anthropomorphic head phantom (n = 13), pencil beam scanning (PBS) treatment plans were generated, and 1 field per plan was assumed to be monitored with a PGI slit camera system. In total, 386 scenarios resembling different relevant or non‐relevant treatment deviations were simulated on planning and control CTs and manually classified into 7 classes: non‐relevant changes (NR) and relevant changes (RE) triggering treatment intervention due to range prediction errors (±RP), setup errors in beam direction (±SE), anatomical changes (AC), or a combination of such errors (CB). PBS spots with reliable PGI information were considered with their nominal Bragg peak position for the generation of two 3D spatial maps of 16 × 16 × 16 voxels containing PGI‐determined range shift and proton number information. Three complexity levels of simulated PGI data were investigated: (I) optimal PGI data, (II) realistic PGI data with simulated Poisson noise based on the locally delivered proton number, and (III) realistic PGI data with an additional positioning uncertainty of the slit camera following an experimentally determined distribution. For each complexity level, 3D‐CNNs were trained on a data subset (n = 9) using patient‐wise leave‐one‐out cross‐validation and tested on an independent test cohort (n = 4). Both the binary task of detecting RE and the multi‐class task of classifying the underlying error source were investigated. Similarly, four different conventional ML classifiers (logistic regression, multilayer perceptron, random forest, and support vector machine) were trained using five previously established handcrafted features extracted from the PGI data and used for performance comparison.
Results
On the test data, the CNN ensemble achieved a binary accuracy of 0.95, 0.96, and 0.93 and a multi‐class accuracy of 0.83, 0.81, and 0.76 for the complexity levels (I), (II), and (III), respectively. In the case of binary classification, the CNN ensemble detected treatment deviations in the most realistic scenario with a sensitivity of 0.95 and a specificity of 0.88. The best performing ML classifiers showed a similar test performance.
Conclusions
This study demonstrates that CNNs can reliably detect relevant changes in realistically simulated PGI data and classify most of the underlying sources of treatment deviations. The CNNs extracted meaningful features from the PGI data with a performance comparable ...