PET/CT images provide a rich data source for clinical prediction models in head and neck squamous cell carcinoma (HNSCC). Deep learning models often use images in an end-to-end fashion with clinical data or no additional input for predictions. However, in the context of HNSCC, the tumor region of interest may be an informative prior in the generation of improved prediction performance. In this study, we utilize a deep learning framework based on a DenseNet architecture to combine PET images, CT images, primary tumor segmentation masks, and clinical data as separate channels to predict progression-free survival (PFS) in days for HNSCC patients. Through internal validation (10-fold cross-validation) based on a large set of training data provided by the 2021 HECKTOR Challenge, we achieve a mean C-index of 0.855 +- 0.060 and 0.650 +- 0.074 when observed events are and are not included in the C-index calculation, respectively. Ensemble approaches applied to cross-validation folds yield C-index values up to 0.698 in the independent test set (external validation). Importantly, the value of the added segmentation mask is underscored in both internal and external validation by an improvement of the C-index when compared to models that do not utilize the segmentation mask. These promising results highlight the utility of including segmentation masks as additional input channels in deep learning pipelines for clinical outcome prediction in HNSCC.