When clinicians assess the prognosis of patients in intensive care, they take imaging and non-imaging data into account. In contrast, many traditional machine learning models rely on only one of these modalities, limiting their potential in medical applications. This work proposes and evaluates a transformer-based neural network as a novel AI architecture that integrates multimodal patient data, i.e., imaging data (chest radiographs) and non-imaging data (clinical data). We evaluate the performance of our model in a retrospective study with 6,125 patients in intensive care. We show that the combined model (area under the receiver operating characteristic curve [AUROC] of 0.863) is superior to the radiographs-only model (AUROC = 0.811, p < 0.001) and the clinical data-only model (AUROC = 0.785, p < 0.001) when tasked with predicting in-hospital survival per patient. Furthermore, we demonstrate that our proposed model is robust in cases where not all (clinical) data points are available.