Unifying Heterogenous Electronic Health Records Systems via Text-Based Code Embedding: Study of Predictive Modeling (Preprint)

Hur, Kyunghoon; Lee, Ji-Young; Oh, Jungwoo; Price, Wesley; Kim, Young‐Hak; Choi, Edward

doi:10.2196/preprints.32523

Cited by 6 publications

(14 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DescEmb [25] proposed to resolve this problem by suggesting a text-based embedding, where hospital-specific feature values are first converted to textual descriptions (e.g., "401.9" → "unspecified essential hypertension"), then a text encoder paired with a sub-word tokenizer is used to obtain m i [37]. With Fig.…”

Section: B General Healthcare Predictive Frameworkmentioning

confidence: 99%

“…Additionally, in multi-source learning, our framework is not constrained by the features that are present in each schema since both the name n k i and the value v k i of the feature are used. A formal comparison of the conventional approach, DescEmb [25] and our approach for obtaining m i is provided below:…”

Section: Employing the Entire Features Of Ehrmentioning

confidence: 99%

“…In another study, DescEmb [25] aimed to overcome the heterogeneity of medical codes by utilizing the clinical descriptions linked to each code, thereby partially enabling multi-source learning. Despite its text-based embedding to avoid the manual code mapping process, this approach still necessitates domain experts to conduct EHR system-specific preprocessing to select compatible and meaningful features from the EHRs.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

GenHPF: General Healthcare Predictive Framework for Multi-Task Multi-Source Learning

Hur,

Oh,

Kim

et al. 2024

IEEE J. Biomed. Health Inform.

View full text Add to dashboard Cite

Despite the remarkable progress in the development of predictive models for healthcare, applying these algorithms on a large scale has been challenging. Algorithms trained on a particular task, based on specific data formats available in a set of medical records, tend to not generalize well to other tasks or databases in which the data fields may differ. To address this challenge, we propose General Healthcare Predictive Framework (GenHPF), which is applicable to any EHR with minimal preprocessing for multiple prediction tasks. GenHPF resolves heterogeneity in medical codes and schemas by converting EHRs into a hierarchical textual representation while incorporating as many features as possible. To evaluate the efficacy of GenHPF, we conduct multi-task learning experiments with single-source and multi-source settings, on three publicly available EHR datasets with different schemas for 12 clinically meaningful prediction tasks. Our framework significantly outperforms baseline models that utilize domain knowledge in multi-source learning, improving average AU-ROC by 1.2%P in pooled learning and 2.6%P in transfer learning while also showing comparable results when trained on a single EHR dataset. Furthermore, we demonstrate that self-supervised pretraining using multi-source datasets is effective when combined with GenHPF, resulting in a 0.6%P AUROC improvement compared to models Manuscript

show abstract

Section: B General Healthcare Predictive Frameworkmentioning

confidence: 99%

Section: Employing the Entire Features Of Ehrmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

GenHPF: General Healthcare Predictive Framework for Multi-Task Multi-Source Learning

Hur,

Oh,

Kim

et al. 2024

IEEE J. Biomed. Health Inform.

View full text Add to dashboard Cite

show abstract

“…In conditions where there is a lack of data, it is possible to enhance the performance of the model [214] . The EHRsrelated tasks include prediction [33,126,[214][215][216][217][218][219][220][221][222] , information extraction from clinic notes [223][224][225][226] , the international classification of disease (ICD) coding [227,228] , medication recommendation [229,230] , etc.…”

Section: Ehrs In Pre-trainingmentioning

confidence: 99%

“…Xu et al [216] introduced the medical knowledge graph combined with self-supervised pre-training to deal with the sparsity and high-dimensional issue of EHR data. Lu et al [221] utilised a pre-trained model to detect disease complications and compute the contributions of particular diseases and admissions. Using the self-supervised learning method, the pre-trained model was trained based on the hidden disease representation.…”

Section: Eegmentioning

confidence: 99%

Pre-training in Medical Data: A Survey

et al. 2023

View full text Add to dashboard Cite

Medical data refers to health-related information associated with regular patient care or as part of a clinical trial program. There are many categories of such data, such as clinical imaging data, bio-signal data, electronic health records (EHR), and multi-modality medical data. With the development of deep neural networks in the last decade, the emerging pre-training paradigm has become dominant in that it has significantly improved machine learning methods’ performance in a data-limited scenario. In recent years, studies of pre-training in the medical domain have achieved significant progress. To summarize these technology advancements, this work provides a comprehensive survey of recent advances for pre-training on several major types of medical data. In this survey, we summarize a large number of related publications and the existing benchmarking in the medical domain. Especially, the survey briefly describes how some pre-training methods are applied to or developed for medical data. From a data-driven perspective, we examine the extensive use of pre-training in many medical scenarios. Moreover, based on the summary of recent pre-training studies, we identify several challenges in this field to provide insights for future studies.

show abstract

The shaky foundations of large language models and foundation models for electronic health records

et al. 2023

View full text Add to dashboard Cite

The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models’ capabilities. In this narrative review, we examine 84 foundation models trained on non-imaging EMR data (i.e., clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g., MIMIC-III) or broad, public biomedical corpora (e.g., PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. Considering these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.

show abstract

Unifying Heterogenous Electronic Health Records Systems via Text-Based Code Embedding: Study of Predictive Modeling (Preprint)

Cited by 6 publications

References 26 publications

GenHPF: General Healthcare Predictive Framework for Multi-Task Multi-Source Learning

GenHPF: General Healthcare Predictive Framework for Multi-Task Multi-Source Learning

Pre-training in Medical Data: A Survey

The shaky foundations of large language models and foundation models for electronic health records

Contact Info

Product

Resources

About