Objective To create and validate a simple and transferable machine learning model from electronic health record data to accurately predict clinical deterioration in patients with covid-19 across institutions, through use of a novel paradigm for model development and code sharing. Design Retrospective cohort study. Setting One US hospital during 2015-21 was used for model training and internal validation. External validation was conducted on patients admitted to hospital with covid-19 at 12 other US medical centers during 2020-21. Participants 33 119 adults (≥18 years) admitted to hospital with respiratory distress or covid-19. Main outcome measures An ensemble of linear models was trained on the development cohort to predict a composite outcome of clinical deterioration within the first five days of hospital admission, defined as in-hospital mortality or any of three treatments indicating severe illness: mechanical ventilation, heated high flow nasal cannula, or intravenous vasopressors. The model was based on nine clinical and personal characteristic variables selected from 2686 variables available in the electronic health record. Internal and external validation performance was measured using the area under the receiver operating characteristic curve (AUROC) and the expected calibration error—the difference between predicted risk and actual risk. Potential bed day savings were estimated by calculating how many bed days hospitals could save per patient if low risk patients identified by the model were discharged early. Results 9291 covid-19 related hospital admissions at 13 medical centers were used for model validation, of which 1510 (16.3%) were related to the primary outcome. When the model was applied to the internal validation cohort, it achieved an AUROC of 0.80 (95% confidence interval 0.77 to 0.84) and an expected calibration error of 0.01 (95% confidence interval 0.00 to 0.02). Performance was consistent when validated in the 12 external medical centers (AUROC range 0.77-0.84), across subgroups of sex, age, race, and ethnicity (AUROC range 0.78-0.84), and across quarters (AUROC range 0.73-0.83). Using the model to triage low risk patients could potentially save up to 7.8 bed days per patient resulting from early discharge. Conclusion A model to predict clinical deterioration was developed rapidly in response to the covid-19 pandemic at a single hospital, was applied externally without the sharing of data, and performed well across multiple medical centers, patient subgroups, and time periods, showing its potential as a tool for use in optimizing healthcare resources.
Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups.In this work, we present a two-step framework to 1) learn relevant patient subgroups, and 2) predict an outcome for separate patient populations in a multi-task framework, where each population is a separate task. We demonstrate how to discover relevant groups in an unsupervised way with a sequence-to-sequence autoencoder. We show that using these groups in a multi-task framework leads to better predictive performance of in-hospital mortality both across groups and overall. We also highlight the need for more granular evaluation of performance when dealing with heterogeneous populations.
Accurate risk models for adverse outcomes can provide important input to clinical decision-making. Surprisingly, one of the main challenges when using machine learning to build clinically useful risk models is the small amount of data available. Risk models need to be developed for specific patient populations, specific institutions, specific procedures, and specific outcomes. With each exclusion criterion, the amount of relevant training data decreases, until there is often an insufficient amount to learn an accurate model. This difficulty is compounded by the large class imbalance that is often present in medical applications.In this paper, we present an approach to address the problem of small data using transfer learning methods in the context of developing risk models for cardiac surgeries. We explore ways to build surgery-specific and hospital-specific models (the target task) using information from other kinds of surgeries and other hospitals (source tasks). We propose a novel method to weight examples based on their similarity to the target task training examples to take advantage of the useful examples while discounting less relevant ones.We show that incorporating appropriate source data in training can lead to improved performance over using only target task training data, and that our method of instance weighting can lead to further improvements. Applied to a surgical risk stratification task, our method, which used data from two institutions, performed comparably to the risk model published by the Society for Thoracic Surgeons, which was developed and tested on over one hundred thousand surgeries from hundreds of institutions.
OBJECTIVE There is a need for a systematic method to implement the World Health Organization’s Clinical Progression Scale (WHO-CPS), an ordinal clinical severity score for COVID-19 patients, to electronic health record (EHR) data. We discuss our process of developing guiding principles mapping EHR data to WHO-CPS scores across multiple institutions. MATERIALS AND METHODS Using WHO-CPS as a guideline, we developed the technical blueprint to map EHR data to ordinal clinical severity scores. We applied our approach to data from two medical centers. RESULTS Our method was able to classify clinical severity for 100% of patient days for 2,756 patient encounters across two institutions. DISCUSSION Implementing new clinical scales can be challenging; strong understanding of health system data architecture was integral to meet the clinical intentions of the WHO-CPS. CONCLUSION We describe a detailed blueprint for how to apply the WHO-CPS scale to patient data from the EHR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.