BackgroundAccurate tumor target contouring and T staging are vital for precision radiation therapy in nasopharyngeal carcinoma (NPC). Identifying T-stage and contouring the Gross tumor volume (GTV) manually is a laborious and highly time-consuming process. Previous deep learning-based studies have mainly been focused on tumor segmentation, and few studies have specifically addressed the tumor staging of NPC.ObjectivesTo bridge this gap, we aim to devise a model that can simultaneously identify T-stage and perform accurate segmentation of GTV in NPC.Materials and methodsWe have developed a transformer-based multi-task deep learning model that can perform two tasks simultaneously: delineating the tumor contour and identifying T-stage. Our retrospective study involved contrast-enhanced T1-weighted images (CE-T1WI) of 320 NPC patients (T-stage: T1-T4) collected between 2017 and 2020 at our institution, which were randomly allocated into three cohorts for three-fold cross-validations, and conducted the external validation using an independent test set. We evaluated the predictive performance using the area under the receiver operating characteristic curve (ROC-AUC) and accuracy (ACC), with a 95% confidence interval (CI), and the contouring performance using the Dice similarity coefficient (DSC) and average surface distance (ASD).ResultsOur multi-task model exhibited sound performance in GTV contouring (median DSC: 0.74; ASD: 0.97 mm) and T staging (AUC: 0.85, 95% CI: 0.82–0.87) across 320 patients. In early T category tumors, the model achieved a median DSC of 0.74 and ASD of 0.98 mm, while in advanced T category tumors, it reached a median DSC of 0.74 and ASD of 0.96 mm. The accuracy of automated T staging was 76% (126 of 166) for early stages (T1-T2) and 64% (99 of 154) for advanced stages (T3-T4). Moreover, experimental results show that our multi-task model outperformed the other single-task models.ConclusionsThis study emphasized the potential of multi-task model for simultaneously delineating the tumor contour and identifying T-stage. The multi-task model harnesses the synergy between these interrelated learning tasks, leading to improvements in the performance of both tasks. The performance demonstrates the potential of our work for delineating the tumor contour and identifying T-stage and suggests that it can be a practical tool for supporting clinical precision radiation therapy.