SemEHR: A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research

Wu, Honghan; Toti, Giulia; Morley, Katherine I.; Ibrahim, Zina; Folarin, Amos; Jackson, Roy; Kartoglu, Ismail E.; Agrawal, Asha; Stringer, Clive; Gale, Darren; Gorrell, Genevieve; Roberts, Angus; Broadbent, Matthew; Stewart, Robert; Dobson, Richard

doi:10.1101/235622

Cited by 13 publications

(13 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Text analytics platforms such as semEHR (built on CogStack) [13, 14] and GATE [33] are increasingly being used across large document repositories, and can incorporate a range of NLP methods such as Bio-Yodie [15] (rules-based information extraction, used in this project) and machine learning metehods. UCLH is proposing to make semEHR a core component of its new clinical research data warehouse.…”

Section: Discussionmentioning

confidence: 99%

“…Stuctured and free text data from the EHR were combined into a searchable indexed repository using the CogStack [13] platform, which contains pipelines for document processing and indexing, fast text searching, and distributed analysis. We used the SemEHR [14] biomedical document processing system on CogStack, with Elasticsearch 1 for full free text search to explore text and annotations and Bio-Yodie [15] (an NLP application) to annotate text using the Unified Medical Language System (UMLS) [16]. SemEHR contextualises each mention of a UMLS concept with the experiencer (patient or other), affirmation status (affirmed, negative or hypothetical) and temporality (past or recent).…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-automated Simulation Based on the LeoPARDS Trial

Tissot

Shah

Agbakoba

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Clinical trials often fail on recruiting an adequate number of appropriate patients. Identifying eligible trial participants is a resource-intensive task when relying on manual review of clinical notes, particularly in critical care settings where the time window is short. Automated review of electronic health records has been explored as a way of identifying trial participants, but much of the information is in unstructured free text rather than a computable form. We developed an electronic health record pipeline that combines structured electronic health record data with free text in order to simulate recruitment into the LeoPARDS trial. We applied an algorithm to identify eligible patients using a moving 1-hour time window, and compared the set of patients identified by our approach with those actually screened and recruited for the trial. We manually reviewed clinical records for a random sample of additional patients identified by the algorithm but not identified for screening in the original trial. Our approach identified 308 patients, of whom 208 were screened in the actual trial. We identified all 40 patients with CCHIC data available who were actually recruited to LeoPARDS in our centre. The algorithm identified 96 patients on the same day as manual screening and 62 patients one or two days earlier. Analysis of electronic health records incorporating natural language processing tools could effectively replicate recruitment in a critical care trial, and identify some eligible patients at an earlier stage. If implemented in real-time this could improve the efficiency of clinical trial recruitment.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-automated Simulation Based on the LeoPARDS Trial

Tissot

Shah

Agbakoba

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…A recent review of clinical IE applications (Wang et al, 2018) notes the increasing interest to NLP but lists only 25 IE systems which were used multiple times, outside the labs where they were created. Isolated attempts exist to apply IE in the context of EHR processing in frameworks for semantic search, for instance SemEHR deployed to identify contextualized mentions of biomedical concepts within EHRs in a number of UK hospitals (Wu et al, 2018). We mention the following research prototypes as experimental developments, based on some sort of IE: (Shi et al, 2017) reports about a system extracting textual medical knowledge from heterogeneous sources in order to integrate it into knowledge graphs; (Hassanpour and Langlotz, 2016) describes a machine learning system that annotates radiology reports and extracts concepts according to a model covering most clinically significant contents in radiology; presents the information extraction and retrieval architecture CogStack, deployed in the King's College Hospital.…”

Section: Related Workmentioning

confidence: 99%

Risk Factors Extraction from Clinical Texts based on Linked Open Data

Boytcheva

Angelova

Angelov

2019

Proceedings - Natural Language Processing in a Deep Learning World

View full text Add to dashboard Cite

This paper presents experiments in risk factors analysis based on clinical texts enhanced with Linked Open Data (LOD). The idea is to determine whether a patient has risk factors for a specific disease analyzing only his/her outpatient records. A semantic graph of "meta-knowledge" about a disease of interest is constructed, with integrated multilingual terms (labels) of symptoms, risk factors etc. coming from Wikidata, PubMed, Wikipedia and MESH, and linked to clinical records of individual patients via ICD-10 codes. Then a predictive model is trained to foretell whether patients are at risk to develop the disease of interest. The testing was done using outpatient records from a nation-wide repository available for the period 2011-2016. The results show improvement of the overall performance of all tested algorithms (kNN, Naïve Bayes, Tree, Logistic regression, ANN), when the clinical texts are enriched with LOD resources.

show abstract

“…In the medical domain, SNOMED CT [ 7 ] and the Human Phenotype Ontology (HPO) [ 8 ] are examples of widely used ontologies to annotate clinical data. After the data has been annotated, it can be reused by clinicians to query EHRs [ 9 , 10 ], to classify patients into different risk groups [ 11 , 12 ], to detect a patient’s eligibility for clinical trials [ 13 ], and for clinical research [ 14 ].…”

Section: Introductionmentioning

confidence: 99%

Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies

et al. 2020

View full text Add to dashboard Cite

Background Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations. Methods Two reviewers examined publications indexed by Scopus, IEEE, MEDLINE, EMBASE, the ACM Digital Library, and the ACL Anthology. Publications reporting on NLP for mapping clinical text from EHRs to ontology concepts were included. Year, country, setting, objective, evaluation and validation methods, NLP algorithms, terminology systems, dataset size and language, performance measures, reference standard, generalizability, operational use, and source code availability were extracted. The studies’ objectives were categorized by way of induction. These results were used to define recommendations. Results Two thousand three hundred fifty five unique studies were identified. Two hundred fifty six studies reported on the development of NLP algorithms for mapping free text to ontology concepts. Seventy-seven described development and evaluation. Twenty-two studies did not perform a validation on unseen data and 68 studies did not perform external validation. Of 23 studies that claimed that their algorithm was generalizable, 5 tested this by external validation. A list of sixteen recommendations regarding the usage of NLP systems and algorithms, usage of data, evaluation and validation, presentation of results, and generalizability of results was developed. Conclusion We found many heterogeneous approaches to the reporting on the development and evaluation of NLP algorithms that map clinical text to ontology concepts. Over one-fourth of the identified publications did not perform an evaluation. In addition, over one-fourth of the included studies did not perform a validation, and 88% did not perform external validation. We believe that our recommendations, alongside an existing reporting standard, will increase the reproducibility and reusability of future studies and NLP algorithms in medicine.

show abstract

SemEHR: A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research

Cited by 13 publications

References 21 publications

Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-automated Simulation Based on the LeoPARDS Trial

Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-automated Simulation Based on the LeoPARDS Trial

Risk Factors Extraction from Clinical Texts based on Linked Open Data

Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies

Contact Info

Product

Resources

About