Accurate stratification of patients with Post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies and could enable more focussed investigation of the molecular pathogenetic mechanisms of this disease. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling long COVID phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Using unsupervised machine learning (k-means clustering), we found six distinct clusters of long COVID patients, each with distinct profiles of phenotypic abnormalities with enrichments in pulmonary, cardiovascular, neuropsychiatric, and constitutional symptoms such as fatigue and fever. There was a highly significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. We show that the clusters we identified in one hospital system were generalizable across different hospital systems. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on long COVID.
Cardiogenic shock (CS) is a severe condition with in-hospital mortality of up to 50%. Patients who develop CS may have previous cardiac history, but that may not always be the case, adding to the challenges in optimally identifying and managing these patients. Patients may present to a medical facility with CS or develop CS while in the emergency department (ED), in a general inpatient ward (WARD) or in the critical care unit (CC). While different clinical pathways for management exist once CS is recognized, there are challenges in identifying the patients in a timely manner, in all settings, in a timeframe that will allow proper management. We therefore developed and evaluated retrospectively a machine learning model based on the XGBoost (XGB) algorithm which runs automatically on patient data from the electronic health record (EHR). The algorithm was trained on 8 years of de-identified data (from 2010 to 2017) collected from a large regional healthcare system. The input variables include demographics, vital signs, laboratory values, some orders, and specific pre-existing diagnoses. The model was designed to make predictions 2 h prior to the need of first CS intervention (inotrope, vasopressor, or mechanical circulatory support). The algorithm achieves an overall area under curve (AUC) of 0.87 (0.81 in CC, 0.84 in ED, 0.97 in WARD), which is considered useful for clinical use. The algorithm can be refined based on specific elements defining patient subpopulations, for example presence of acute myocardial infarction (AMI) or congestive heart failure (CHF), further increasing its precision when a patient has these conditions. The top-contributing risk factors learned by the model are consistent with existing clinical findings. Our conclusion is that a useful machine learning model can be used to predict the development of CS. This manuscript describes the main steps of the development process and our results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.