BackgroundPredictive modeling with longitudinal electronic health record (EHR) data offers great promise for accelerating personalized medicine and better informs clinical decision-making. Recently, deep learning models have achieved state-of-the-art performance for many healthcare prediction tasks. However, deep models lack interpretability, which is integral to successful decision-making and can lead to better patient care. In this paper, we build upon the contextual decomposition (CD) method, an algorithm for producing importance scores from long short-term memory networks (LSTMs). We extend the method to bidirectional LSTMs (BiLSTMs) and use it in the context of predicting future clinical outcomes using patients’ EHR historical visits.MethodsWe use a real EHR dataset comprising 11071 patients, to evaluate and compare CD interpretations from LSTM and BiLSTM models. First, we train LSTM and BiLSTM models for the task of predicting which pre-school children with respiratory system-related complications will have asthma at school-age. After that, we conduct quantitative and qualitative analysis to evaluate the CD interpretations produced by the contextual decomposition of the trained models. In addition, we develop an interactive visualization to demonstrate the utility of CD scores in explaining predicted outcomes.ResultsOur experimental evaluation demonstrate that whenever a clear visit-level pattern exists, the models learn that pattern and the contextual decomposition can appropriately attribute the prediction to the correct pattern. In addition, the results confirm that the CD scores agree to a large extent with the importance scores generated using logistic regression coefficients. Our main insight was that rather than interpreting the attribution of individual visits to the predicted outcome, we could instead attribute a model’s prediction to a group of visits.ConclusionWe presented a quantitative and qualitative evidence that CD interpretations can explain patient-specific predictions using CD attributions of individual visits or a group of visits.
Background Early identification of pregnant women at risk for preterm birth (PTB), a major cause of infant mortality and morbidity, has a significant potential to improve prenatal care. However, we lack effective predictive models which can accurately forecast PTB and complement these predictions with appropriate interpretations for clinicians. In this work, we introduce a clinical prediction model (PredictPTB) which combines variables (medical codes) readily accessible through electronic health record (EHR) to accurately predict the risk of preterm birth at 1, 3, 6, and 9 months prior to delivery. Methods The architecture of PredictPTB employs recurrent neural networks (RNNs) to model the longitudinal patient’s EHR visits and exploits a single code-level attention mechanism to improve the predictive performance, while providing temporal code-level and visit-level explanations for the prediction results. We compare the performance of different combinations of prediction time-points, data modalities, and data windows. We also present a case-study of our model’s interpretability illustrating how clinicians can gain some transparency into the predictions. Results Leveraging a large cohort of 222,436 deliveries, comprising a total of 27,100 unique clinical concepts, our model was able to predict preterm birth with an ROC-AUC of 0.82, 0.79, 0.78, and PR-AUC of 0.40, 0.31, 0.24, at 1, 3, and 6 months prior to delivery, respectively. Results also confirm that observational data modalities (such as diagnoses) are more predictive for preterm birth than interventional data modalities (e.g., medications and procedures). Conclusions Our results demonstrate that PredictPTB can be utilized to achieve accurate and scalable predictions for preterm birth, complemented by explanations that directly highlight evidence in the patient’s EHR timeline.
With the advent of ultra high-throughput DNA sequencing technologies used in Next-Generation Sequencing (NGS) machines, we are facing a daunting new era in petabyte scale bioinformatics data. The enormous amounts of data produced by NGS machines lead to storage, scalability, and performance challenges. At the same time, cloud computing architectures are rapidly emerging as robust and economical solutions to high performance computing of all kinds. To date, these architectures have had limited impact on the sequence alignment problem, whereby sequence reads must be compared to a reference genome. In this research, we present a methodology for efficient transformation of one of the recently developed NGS alignment tools, SHRiMP, into the cloud environment based on the MapReduce programming model. Critical to the function and performance of our methodology is the implementation of several techniques and mechanisms for facilitating the task of porting the SHRiMP sequence alignment tool into the cloud. These techniques and mechanisms allow the "cloudified" SHRiMP to run as a black box within the MapReduce model, without the need for building new parallel algorithms or recoding this tool from scratch. The approach is based on the MapReduce parallel programming model, its open source implementation Hadoop, and its underlying distributed file system (HDFS). The deployment of the developed methodology utilizes the cloud infrastructure installed at Qatar University. Experimental results demonstrate that multiplexing large-scale SHRiMP sequence alignment jobs in parallel using the MapReduce framework dramatically improves the performance when the user utilizes the resources provided by the cloud. In conclusion, using cloud computing for NGS data analysis is a viable and efficient alternative to analyzing data on in-house compute clusters. The efficiency and flexibility of the cloud computing environments and the MapReduce programming model provide a powerful version of the SHRiMP sequence alignment tool with a considerable boost. Using this methodology, ordinary biologists can perform the computationally demanding sequence alignment tasks without the need to delve deep into server and database management, without the complexities and hassles of running jobs on grids and clusters, and without the need to modify the existing code in order to adapt it for parallel processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.