PurposeIn-hospital health-related adverse events (HAEs) are a major concern for hospitals worldwide. In high-income countries, approximately 1 in 10 patients experience HAEs associated with their hospital stay. Estimating the risk of an HAE at the individual patient level as accurately as possible is one of the first steps towards improving patient outcomes. Risk assessment can enable healthcare providers to target resources to patients in greatest need through adaptations in processes and procedures. Electronic health data facilitates the application of machine-learning methods for risk analysis. We aim, first to reveal correlations between HAE occurrence and patients’ characteristics and/or the procedures they undergo during their hospitalisation, and second, to build models that allow the early identification of patients at an elevated risk of HAE.Participants143 865 adult patients hospitalised at Grenoble Alpes University Hospital (France) between 1 January 2016 and 31 December 2018.Findings to dateIn this set-up phase of the project, we describe the preconditions for big data analysis using machine-learning methods. We present an overview of the retrospective de-identified multisource data for a 2-year period extracted from the hospital’s Clinical Data Warehouse, along with social determinants of health data from the National Institute of Statistics and Economic Studies, to be used in machine learning (artificial intelligence) training and validation. No supplementary information or evaluation on the part of medical staff will be required by the information system for risk assessment.Future plansWe are using this data set to develop predictive models for several general HAEs including secondary intensive care admission, prolonged hospital stay, 7-day and 30-day re-hospitalisation, nosocomial bacterial infection, hospital-acquired venous thromboembolism, and in-hospital mortality.