Alzheimer’s Disease (AD) is a neurodegenerative disorder that is still not fully understood. Sex modifies AD vulnerability, but the reasons for this are largely unknown. We utilize two independent electronic medical record (EMR) systems across 44,288 patients to perform deep clinical phenotyping and network analysis to gain insight into clinical characteristics and sex-specific clinical associations in AD. Embeddings and network representation of patient diagnoses demonstrate greater comorbidity interactions in AD in comparison to matched controls. Enrichment analysis identifies multiple known and new diagnostic, medication, and lab result associations across the whole cohort and in a sex-stratified analysis. With this data-driven method of phenotyping, we can represent AD complexity and generate hypotheses of clinical factors that can be followed-up for further diagnostic and predictive analyses, mechanistic understanding, or drug repurposing and therapeutic approaches.
Clinical trial emulation, which is the process of mimicking targeted randomized controlled trials (RCT) with real world data (RWD), has attracted growing attentions and interests in recent years from pharmaceutical industry. Different from RCTs which have stringent eligibility criteria for recruiting participants, RWD are more representative of real world patients whom the drugs will be prescribed to. One technical challenge for trial emulation is how to conduct effective confounding control with complex RWD so that the treatment effects can be objectively derived. Recently many approaches, including deep learning algorithms, have been proposed for this goal, but there is still no systematic evaluation and practical guidance on them.
In this paper, we emulate $430,000$ trials from two large-scale RWD warehouses, covering both electronic health records (EHR) and general claims,
over 170 million patients spanning more than 10 years, aiming to identify new indications of approved drugs for Alzheimer's disease (AD). We have investigated the behaviors of multiple different approaches including logistic regression and deep learning models, and propose a new model selection strategy that can significantly improve the performance of confounding balance of the participants in different arms of emulated trials. We demonstrate that regularized logistic regression based propensity score (PS) model outperforms deep learning based PS model and others, which contradicts with our intuitions to certain extent. Finally,
we identified 8 drugs whose original indications are not AD (pantoprazole, gabapentin, acetaminophen, atorvastatin, albuterol, fluticasone, amoxicillin and omeprazole), hold great potential of being beneficial to AD patients.
Objectives
Classifying hospital admissions into various acute myocardial infarction phenotypes in electronic health records (EHRs) is a challenging task with strong research implications that remains unsolved. To our knowledge, this study is the first study to design and validate phenotyping algorithms using cardiac catheterizations to identify not only patients with a ST-elevation myocardial infarction (STEMI), but the specific encounter when it occurred.
Materials and Methods
We design and validate multi-modal algorithms to phenotype STEMI on a multicenter EHR containing 5.1 million patients and 115 million patient encounters by using discharge summaries, diagnosis codes, electrocardiography readings, and the presence of cardiac catheterizations on the encounter.
Results
We demonstrate that robustly phenotyping STEMIs by selecting discharge summaries containing “STEM” has the potential to capture the most number of STEMIs (positive predictive value [PPV] = 0.36, N = 2110), but that addition of a STEMI-related International Classification of Disease (ICD) code and cardiac catheterizations to these summaries yields the highest precision (PPV = 0.94, N = 952).
Discussion and Conclusion
In this study, we demonstrate that the incorporation of percutaneous coronary intervention increases the PPV for detecting STEMI-related patient encounters from the EHR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.