Patient representation learning and interpretable evaluation using clinical notes

Sushil, Madhumita; Šuster, Simon; Luyckx, Kim; Daelemans, Walter

doi:10.1016/j.jbi.2018.06.016

Cited by 37 publications

(32 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Liu’s model [ 15 ] forecasts the onset of 3 kinds of diseases using medical notes. Sushil [ 16 ] utilizes a stacked denoised autoencoder and a paragraph vector model to learn generalized patient representation directly from clinical notes and the learned representation is used to predict mortality.…”

Section: Introductionmentioning

confidence: 99%

Combining structured and unstructured data for predictive models: a deep learning approach

Zhang

Yin

Zeng

et al. 2020

BMC Med Inform Decis Mak

142

View full text Add to dashboard Cite

Background The broad adoption of electronic health records (EHRs) provides great opportunities to conduct health care research and solve various clinical problems in medicine. With recent advances and success, methods based on machine learning and deep learning have become increasingly popular in medical informatics. However, while many research studies utilize temporal structured data on predictive modeling, they typically neglect potentially valuable information in unstructured clinical notes. Integrating heterogeneous data types across EHRs through deep learning techniques may help improve the performance of prediction models. Methods In this research, we proposed 2 general-purpose multi-modal neural network architectures to enhance patient representation learning by combining sequential unstructured notes with structured data. The proposed fusion models leverage document embeddings for the representation of long clinical note documents and either convolutional neural network or long short-term memory networks to model the sequential clinical notes and temporal signals, and one-hot encoding for static information representation. The concatenated representation is the final patient representation which is used to make predictions. Results We evaluate the performance of proposed models on 3 risk prediction tasks (i.e. in-hospital mortality, 30-day hospital readmission, and long length of stay prediction) using derived data from the publicly available Medical Information Mart for Intensive Care III dataset. Our results show that by combining unstructured clinical notes with structured data, the proposed models outperform other models that utilize either unstructured notes or structured data only. Conclusions The proposed fusion models learn better patient representation by combining structured and unstructured data. Integrating heterogeneous data types across EHRs helps improve the performance of prediction models and reduce errors.

show abstract

Section: Introductionmentioning

confidence: 99%

Combining structured and unstructured data for predictive models: a deep learning approach

Zhang

Yin

Zeng

et al. 2020

BMC Med Inform Decis Mak

142

View full text Add to dashboard Cite

show abstract

“…Miotto et al [25] adopted SDAs to generate patient representations. Furthermore, Sushil et al [26] derived task-independent patient representations directly from clinical notes by applying SDAs and a paragraph vector model. The above two methods only consider the frequency of medical events.…”

Section: Related Workmentioning

confidence: 99%

Representation learning for clinical time series prediction tasks in electronic health records

Ruan

Lei

Zhou

et al. 2019

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

BackgroundElectronic health records (EHRs) provide possibilities to improve patient care and facilitate clinical research. However, there are many challenges faced by the applications of EHRs, such as temporality, high dimensionality, sparseness, noise, random error and systematic bias. In particular, temporal information is difficult to effectively use by traditional machine learning methods while the sequential information of EHRs is very useful.MethodIn this paper, we propose a general-purpose patient representation learning approach to summarize sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder (RNN-DAE) is employed to encode inhospital records of each patient into a low dimensional dense vector.ResultsBased on EHR data collected from Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine, we experimentally evaluate our proposed RNN-DAE method on both mortality prediction task and comorbidity prediction task. Extensive experimental results show that our proposed RNN-DAE method outperforms existing methods. In addition, we apply the “Deep Feature” represented by our proposed RNN-DAE method to track similar patients with t-SNE, which also achieves some interesting observations.ConclusionWe propose an effective unsupervised RNN-DAE method to summarize patient sequential information in EHR data. Our proposed RNN-DAE method is useful on both mortality prediction task and comorbidity prediction task.

show abstract

“…The learned representations are used to predict ICD codes occurring in the next 30, 60, 90, and 180 days. In contrast to the previous works, Sushil et al 20 focuses exclusively on EHR text to learn patient representations using unsupervised methods, such as stacked denoising autoencoders and doc2vec. 7 They find that the learned representations outperform traditional bag-of-words representations when few training examples are available and that the target task does not rely on strong lexical features.…”

Section: Introductionmentioning

confidence: 99%

“…7 They find that the learned representations outperform traditional bag-of-words representations when few training examples are available and that the target task does not rely on strong lexical features. Like Sushil et al, 20 our work uses text variables only.…”

Section: Introductionmentioning

confidence: 99%

Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse

Dligach

Afshar

Miller

2019

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

Objective Our objective is to develop algorithms for encoding clinical text into representations that can be used for a variety of phenotyping tasks. Materials and Methods Obtaining large datasets to take advantage of highly expressive deep learning methods is difficult in clinical natural language processing (NLP). We address this difficulty by pretraining a clinical text encoder on billing code data, which is typically available in abundance. We explore several neural encoder architectures and deploy the text representations obtained from these encoders in the context of clinical text classification tasks. While our ultimate goal is learning a universal clinical text encoder, we also experiment with training a phenotype-specific encoder. A universal encoder would be more practical, but a phenotype-specific encoder could perform better for a specific task. Results We successfully train several clinical text encoders, establish a new state-of-the-art on comorbidity data, and observe good performance gains on substance misuse data. Discussion We find that pretraining using billing codes is a promising research direction. The representations generated by this type of pretraining have universal properties, as they are highly beneficial for many phenotyping tasks. Phenotype-specific pretraining is a viable route for trading the generality of the pretrained encoder for better performance on a specific phenotyping task. Conclusions We successfully applied our approach to many phenotyping tasks. We conclude by discussing potential limitations of our approach.

show abstract

Patient representation learning and interpretable evaluation using clinical notes

Cited by 37 publications

References 24 publications

Combining structured and unstructured data for predictive models: a deep learning approach

Combining structured and unstructured data for predictive models: a deep learning approach

Representation learning for clinical time series prediction tasks in electronic health records

Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse

Contact Info

Product

Resources

About