Background: The timeliness of detection of a sepsis incidence in progress is a crucial factor in the outcome for the patient. Machine learning models built from data in electronic health records can be used as an effective tool for improving this timeliness, but so far the potential for clinical implementations has been largely limited to studies in intensive care units. This study will employ a richer data set that will expand the applicability of these models beyond intensive care units. Furthermore, we will circumvent several important limitations that have been found in the literature: 1) Models are evaluated shortly before sepsis onset without considering interventions already initiated. 2) Machine learning models are built on a restricted set of clinical parameters, which are not necessarily measured in all departments. 3) Model performance is limited by current knowledge of sepsis, as feature interactions and time dependencies are hard-coded into the model. Methods: In this study, we present a model to overcome these shortcomings using a deep learning approach on a diverse multicenter data set. We used retrospective data from multiple Danish hospitals over a seven-year period. Our sepsis detection system is constructed as a combination of a convolutional neural network and a long short-term memory network. We suggest a retrospective assessment of interventions by looking at intravenous antibiotics and blood cultures preceding the prediction time. Results: Results show performance ranging from AUROC 0.856 (3 hours before sepsis onset) to AUROC 0.756 (24 hours before sepsis onset). Evaluating the clinical utility of the model, we find that a large proportion of septic patients did not receive antibiotic treatment or blood culture at the time of the sepsis prediction, and the model could therefore facilitate such interventions at an earlier point in time. Conclusion: We present a deep learning system for early detection of sepsis that is able to learn characteristics of the key factors and interactions from the raw event sequence data itself, without relying on a labor-intensive feature extraction work. Our system outperforms baseline models, such as gradient boosting, which rely on specific data elements and therefore suffer from many missing values in our dataset.