Across jurisdictions, government and health insurance providers hold a large amount of data from patient interactions with the healthcare system. We aimed to develop a machine learning-based model for predicting adverse outcomes due to diabetes complications using administrative health data from the single-payer health system in Ontario, Canada. A Gradient Boosting Decision Tree model was trained on data from 1,029,366 patients, validated on 272,864 patients, and tested on 265,406 patients. Discrimination was assessed using the AUC statistic and calibration was assessed visually using calibration plots overall and across population subgroups. Our model predicting three-year risk of adverse outcomes due to diabetes complications (hyper/hypoglycemia, tissue infection, retinopathy, cardiovascular events, amputation) included 700 features from multiple diverse data sources and had strong discrimination (average test AUC = 77.7, range 77.7–77.9). Through the design and validation of a high-performance model to predict diabetes complications adverse outcomes at the population level, we demonstrate the potential of machine learning and administrative health data to inform health planning and healthcare resource allocation for diabetes management.
Key Points
Question
Can a machine learning model trained on routinely collected administrative health data be used to accurately predict the onset of type 2 diabetes at the population level?
Findings
In this decision analytical model study of 2.1 million residents in Ontario, Canada, a machine learning model was developed with high discrimination, population-level calibration, and calibration across population subgroups.
Meaning
Study results suggest that machine learning and administrative health data can be used to create population health planning tools that accurately discriminate between high- and low-risk groups to guide investments and targeted interventions for diabetes prevention.
ObjectiveTo predict older adults’ risk of avoidable hospitalisation related to ambulatory care sensitive conditions (ACSC) using machine learning applied to administrative health data of Ontario, Canada.Design, setting and participantsA retrospective cohort study was conducted on a large cohort of all residents covered under a single-payer system in Ontario, Canada over the period of 10 years (2008–2017). The study included 1.85 million Ontario residents between 65 and 74 years old at any time throughout the study period.Data sourcesAdministrative health data from Ontario, Canada obtained from the (ICES formely known as the Institute for Clinical Evaluative Sciences Data Repository.Main outcome measuresRisk of hospitalisations due to ACSCs 1 year after the observation period.ResultsThe study used a total of 1 854 116 patients, split into train, validation and test sets. The ACSC incidence rates among the data points were 1.1% for all sets. The final XGBoost model achieved an area under the receiver operating curve of 80.5% and an area under precision–recall curve of 0.093 on the test set, and the predictions were well calibrated, including in key subgroups. When ranking the model predictions, those at the top 5% of risk as predicted by the model captured 37.4% of those presented with an ACSC-related hospitalisation. A variety of features such as the previous number of ambulatory care visits, presence of ACSC-related hospitalisations during the observation window, age, rural residence and prescription of certain medications were contributors to the prediction. Our model was also able to capture the geospatial heterogeneity of ACSC risk in Ontario, and especially the elevated risk in rural and marginalised regions.ConclusionsThis study aimed to predict the 1-year risk of hospitalisation from ambulatory-care sensitive conditions in seniors aged 65–74 years old with a single, large-scale machine learning model. The model shows the potential to inform population health planning and interventions to reduce the burden of ACSC-related hospitalisations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.