PurposeIdentify Oropharyngeal cancer (OPC) patients at high-risk of developing long-term severe radiation-associated symptoms using dose volume histograms for organs-at-risk, via unsupervised clustering.Material and methodsAll patients were treated using radiation therapy for OPC. Dose-volume histograms of organs-at-risk were extracted from patients’ treatment plans. Symptom ratings were collected via the MD Anderson Symptom Inventory (MDASI) given weekly during, and 6 months post-treatment. Drymouth, trouble swallowing, mucus, and vocal dysfunction were selected for analysis in this study. Patient stratifications were obtained by applying Bayesian Mixture Models with three components to patient’s dose histograms for relevant organs. The clusters with the highest total mean doses were translated into dose thresholds using rule mining. Patient stratifications were compared against Tumor staging information using multivariate likelihood ratio tests. Model performance for prediction of moderate/severe symptoms at 6 months was compared against normal tissue complication probability (NTCP) models using cross-validation.ResultsA total of 349 patients were included for long-term symptom prediction. High-risk clusters were significantly correlated with outcomes for severe late drymouth (p <.0001, OR = 2.94), swallow (p = .002, OR = 5.13), mucus (p = .001, OR = 3.18), and voice (p = .009, OR = 8.99). Simplified clusters were also correlated with late severe symptoms for drymouth (p <.001, OR = 2.77), swallow (p = .01, OR = 3.63), mucus (p = .01, OR = 2.37), and voice (p <.001, OR = 19.75). Proposed cluster stratifications show better performance than NTCP models for severe drymouth (AUC.598 vs.559, MCC.143 vs.062), swallow (AUC.631 vs.561, MCC.20 vs -.030), mucus (AUC.596 vs.492, MCC.164 vs -.041), and voice (AUC.681 vs.555, MCC.181 vs -.019). Simplified dose thresholds also show better performance than baseline models for predicting late severe ratings for all symptoms.ConclusionOur results show that leveraging the 3-D dose histograms from radiation therapy plan improves stratification of patients according to their risk of experiencing long-term severe radiation associated symptoms, beyond existing NTPC models. Our rule-based method can approximate our stratifications with minimal loss of accuracy and can proactively identify risk factors for radiation-associated toxicity.