ObjectiveTo summarize literature describing approaches aimed at automatically identifying patients with a common phenotype.Materials and methodsWe performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included.ResultsNinety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients.DiscussionWe observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems.ConclusionsThere are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.
State of the art models using deep neural networks have become very good in learning an accurate mapping from inputs to outputs. However, they still lack generalization capabilities in conditions that differ from the ones encountered during training. This is even more challenging in specialized, and knowledge intensive domains, where training data is limited. To address this gap, we introduce MedNLI 1 -a dataset annotated by doctors, performing a natural language inference task (NLI), grounded in the medical history of patients. We present strategies to: 1) leverage transfer learning using datasets from the open domain, (e.g. SNLI) and 2) incorporate domain knowledge from external data and lexical sources (e.g. medical terminologies). Our results demonstrate performance gains using both strategies. * Work done during an internship at IBM Research
This paper presents the MEDIQA 2019 shared task organized at the ACL-BioNLP workshop. The shared task is motivated by a need to develop relevant methods, techniques and gold standards for inference and entailment in the medical domain, and their application to improve domain specific information retrieval and question answering systems. MEDIQA 2019 includes three tasks: Natural Language Inference (NLI), Recognizing Question Entailment (RQE), and Question Answering (QA) in the medical domain. 72 teams participated in the challenge, achieving an accuracy of 98% in the NLI task, 74.9% in the RQE task, and 78.3% in the QA task. In this paper, we describe the tasks, the datasets, and the participants' approaches and results. We hope that this shared task will attract further research efforts in textual inference, question entailment, and question answering in the medical domain. 12 metamap.nlm.nih.gov 13 github.com/abachaa/MedQuAD 14 github.com/Team-IIT-KGP/Qspider
BackgroundReadmissions after hospital discharge are a common occurrence and are costly for both hospitals and patients. Previous attempts to create universal risk prediction models for readmission have not met with success. In this study we leveraged a comprehensive electronic health record to create readmission-risk models that were institution- and patient- specific in an attempt to improve our ability to predict readmission.MethodsThis is a retrospective cohort study performed at a large midwestern tertiary care medical center. All patients with a primary discharge diagnosis of congestive heart failure, acute myocardial infarction or pneumonia over a two-year time period were included in the analysis.The main outcome was 30-day readmission. Demographic, comorbidity, laboratory, and medication data were collected on all patients from a comprehensive information warehouse. Using multivariable analysis with stepwise removal we created three risk disease-specific risk prediction models and a combined model. These models were then validated on separate cohorts.Results3572 patients were included in the derivation cohort. Overall there was a 16.2% readmission rate. The acute myocardial infarction and pneumonia readmission-risk models performed well on a random sample validation cohort (AUC range 0.73 to 0.76) but less well on a historical validation cohort (AUC 0.66 for both). The congestive heart failure model performed poorly on both validation cohorts (AUC 0.63 and 0.64).ConclusionsThe readmission-risk models for acute myocardial infarction and pneumonia validated well on a contemporary cohort, but not as well on a historical cohort, suggesting that models such as these need to be continuously trained and adjusted to respond to local trends. The poor performance of the congestive heart failure model may suggest that for chronic disease conditions social and behavioral variables are of greater importance and improved documentation of these variables within the electronic health record should be encouraged.
Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural language models yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.