Accurate Clinical and Biomedical Named Entity Recognition at Scale

Kocaman, Veysel; Talby, David

doi:10.1016/j.simpa.2022.100373

Cited by 27 publications

(10 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PSJH has an existing corpora of de-identified notes that were created using a sequence of operations performed on text data to remove PHI (protected health information) 34 . These operations included multiple pre-trained ML models and/or regular expressions.…”

Section: De-identification Of Patient Notesmentioning

confidence: 99%

Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records

Ralevski,

Taiyab,

Nossal

et al. 2024

Preprint

View full text Add to dashboard Cite

Social Determinants of Health (SDoH) are an important part of the exposome and are known to have a large impact on variation in health outcomes. In particular, housing stability is known to be intricately linked to a patient's health status, and pregnant women experiencing housing instability (HI) are known to have worse health outcomes. Most SDoH information is stored in electronic health records (EHRs) as free text (unstructured) clinical notes, which traditionally required natural language processing (NLP) for automatic identification of relevant text or keywords. A patient's housing status can be ambiguous or subjective, and can change from note to note or within the same note, making it difficult to use existing NLP solutions. New developments in NLP allow researchers to prompt LLMs to perform complex, subjective annotation tasks that require reasoning that previously could only be attempted by human annotators. For example, large language models (LLMs) such as GPT (Generative Pre-trained Transformer) enable researchers to analyze complex, unstructured data using simple prompts. We used a secure platform within a large healthcare system to compare the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results from these LLMs were compared with results from manual annotation, a named entity recognition (NER) model, and regular expressions (RegEx). We developed a chain-of-thought prompt requiring evidence and justification for each note from the LLMs, to help maximize the chances of finding relevant text related to HI while minimizing hallucinations and false positives. Compared with GPT-3.5 and the NER model, GPT-4 had the highest performance and had a much higher recall (0.924) than human annotators (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human annotators (0.971). In most cases, the evidence output by GPT-4 was similar or identical to that of human annotators, and there was no evidence of hallucinations in any of the outputs from GPT-4. Most cases where the annotators and GPT-4 differed were ambiguous or subjective, such as "living in an apartment with too many people". We also looked at GPT-4 performance on de-identified versions of the same notes and found that precision improved slightly (0.936 original, 0.939 de-identified), while recall dropped (0.781 original, 0.704 de-identified). This work demonstrates that, while manual annotation is likely to yield slightly more accurate results overall, LLMs, when compared with manual annotation, provide a scalable, cost-effective solution with the advantage of greater recall. At the same time, further evaluation is needed to address the risk of missed cases and bias in the initial selection of housing-related notes. Additionally, while it was possible to reduce confabulation, signs of unusual justifications remained. Given these factors, together with changes in both LLMs and charting over time, this approach is not yet appropriate for use as a fully-automated process. However, these results demonstrate the potential for using LLMs for computer-assisted annotation with human review, reducing cost and increasing recall. More efficient methods for obtaining structured SDoH data can help accelerate inclusion of exposome variables in biomedical research, and support healthcare systems in identifying patients who could benefit from proactive outreach.

show abstract

Section: De-identification Of Patient Notesmentioning

confidence: 99%

Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records

Ralevski,

Taiyab,

Nossal

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…[1][2][3][4][5][6] While rule-based models extract phenotypes based on pre-defined patterns, most machine learning and deep-learning approaches are trained on sentences or documents labeled with the relevant phenotypes and the model subsequently classifies texts into these phenotypes. 5,7 MedspaCy 6 and scispaCy 8 are two recent and extensively-used hybrid frameworks that utilize statistical and machine-learning methods in conjunction with rule-based NLP to identify clinical phenotypes.…”

Section: Introductionmentioning

confidence: 99%

“…[1][2][3][4][5][6][7] While rule-based models extract phenotypes based on pre-defined patterns, most machine learning and deep-learning approaches are trained on sentences or documents labeled with the relevant phenotypes and the model subsequently classifies texts into these phenotypes. 5,8 SpaCy models, including MedspaCy 7 and scispaCy 9 are two recent and frequently used hybrid frameworks that utilize statistical and machine-learning named entity recognition methods in conjunction with rule-based NLP to identify clinical phenotypes. There are studies that have utilized medspaCy and scispaCy to identify specific sections within EHR text for NER, extract phenotypes from relation extraction documents, and generate text embeddings.…”

Section: Introductionmentioning

confidence: 99%

Leveraging GPT-4 for Identifying Cancer Phenotypes in Electronic Health Records: A Performance Comparison between GPT-4, GPT-3.5-turbo, Flan-T5 and spaCy’s Rule-based & Machine Learning-based methods

Bhattarai,

Oh,

Sierra

et al. 2023

Preprint

View full text Add to dashboard Cite

ObjectiveAccurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients’ health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI’s transformer-based Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal is to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, and two rule-based and machine learning-based methods, namely, scispaCy and medspaCy.Materials and MethodsPhenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13,646 records for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, medspaCy and scispaCy by comparing precision, recall, and weighted F1 scores.ResultsGPT-4 achieves higher F1 score, precision, and recall compared to medspaCy and scispaCy’s models. GPT-3.5-turbo performs similar to that of GPT-4. GPT models are not constrained by explicit rule requirements for contextual pattern recognition. SpaCy models rely on predefined patterns, leading to their suboptimal performance.Discussion and ConclusionGPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, robust clinical phenotype extraction, and improved ability to provide better care to the patients.

show abstract

“…It was estimated in 2017 that the footprint of medical data would double every 73 days by 2020 and continue to increase exponentially, and an estimated 80% of this data would be unstructured (2). However, this form of data remains largely inaccessible to statistical analysis (3,4). Furthermore, manual extraction of this data is costly and time consuming (5,6).…”

Section: Introductionmentioning

confidence: 99%

“…Spark NLP is a widely used library by healthcare organizations for NLP pipelines that are accurate and scale easily in a distributed environment (3). The Spark NLP Named Entity Recognition (NER) models use a BiLSTM-CNN-Char deep neural network architecture (14).…”

Section: Introductionmentioning

confidence: 99%

Development of a Natural Language Processing Pipeline to Identify Histological Subtypes and Site of Cancer from Pathology Reports

Ng¹,

Low

Tay

et al. 2022

Preprint

View full text Add to dashboard Cite

Purpose To develop a Natural Language Processing (NLP) pipeline with the ability to determine the histological subtype and site of a patient’s cancer from pathology reports. Methods A Spark NLP-based deep learning model pipeline was developed to perform named entity recognition (NER) and assertion status detection for histological subtypes before extracting key relations of interest to determine the site of a patient’s cancer from pathology reports. We assessed the ability of this NLP pipeline to extract histological subtypes and site of a patient’s cancer against manual curation of pathology reports. Results A total of 1358 reports from 474 patients seen at a single tertiary cancer centre were used in the development and validation of the pipeline. The NLP pipeline achieved a mean accuracy of 99.79% and an F1 score of 84.08% for NER of histological subtypes. The relation extraction (RE) model also achieved an average accuracy of 91.96% and an F1-score of 92.45% for key entity relations relevant to histological subtypes entities. Conclusion We developed an NLP pipeline that can extract the histological subtypes and relate them to the site of a patient’s cancer from free-text pathology reports with high accuracy. This has the potential to be deployed for both research and clinical quality processes.

show abstract

Accurate Clinical and Biomedical Named Entity Recognition at Scale

Cited by 27 publications

References 48 publications

Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records

Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records

Leveraging GPT-4 for Identifying Cancer Phenotypes in Electronic Health Records: A Performance Comparison between GPT-4, GPT-3.5-turbo, Flan-T5 and spaCy’s Rule-based & Machine Learning-based methods

Development of a Natural Language Processing Pipeline to Identify Histological Subtypes and Site of Cancer from Pathology Reports

Contact Info

Product

Resources

About