Automatic Extraction of Social Determinants of Health from Medical Notes of Chronic Lower Back Pain Patients

Lituiev, Dmytro; Lacar, Benjamin; Pak, Sang S.; Abramowitsch, Peter L; Marchis, Emilia De; Peterson, Thomas A

doi:10.1101/2022.03.04.22271541

Cited by 3 publications

(5 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In previous works, SDOH have been extracted from clinical data using different methods. These state-of-the-art methods can be categorized into conventional methods like regular expressions, dictionaries or rule-based like cTAKES 33,34 , or deep neural networks like CNN, LSTMs 35 and the latest Transformer based methods 34 . Surprisingly, in the large training sets, language model-based representations outperformed other trained neural nets.…”

Section: Discussionmentioning

confidence: 99%

“…Figure 6: COVID-19 hospitalization by race and ethnicity.DISCUSSIONPrevious works: Previous works have extracted SDOH information from clinical data using different methods such as regular expressions, dictionaries, rule-based methods like cTAKES[41,42], and deep neural networks like CNNs, LSTMs[43], and Transformer-based methods[42]. Language model-based representations have been found to perform well, especially with large training sets.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Discovering Social Determinants of Health from Case Reports using Natural Language Processing: Algorithmic Development and Validation

Raza

Dolatabadi

Ondrusek

et al. 2022

Preprint

View full text Add to dashboard Cite

Background: Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available via electronic health records, clinical reports, and social media, usually in free texts format, which poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective: The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and Data: The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create gold labels, and active learning is used for corpus re-annotation. Methods: A named entity recognition (NER) framework is developed and tested to extract SDOH along with a few prominent clinical entities (diseases, treatments, diagnosis) from the free texts. The proposed model consists of three deep neural networks-A Transformer-based model, a BiLSTM model and a CRF module. Results: The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions: NLP can be used to extract key information, such as SDOH from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.

show abstract

Section: Discussionmentioning

confidence: 99%

mentioning

confidence: 99%

Discovering Social Determinants of Health from Case Reports using Natural Language Processing: Algorithmic Development and Validation

Raza

Dolatabadi

Ondrusek

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Firstly, many studies have focused on a limited set of SDoH factors. The review [22] revealed that among the 82 methods examined, only three SDoH factors were commonly addressed: smoking status (27 methods), substance use (21), and homelessness (20). Other critical factors such as education, insurance status, and broader social issues are still in the developmental stage.…”

Section: Related Workmentioning

confidence: 99%

“…Notably, Patra et al conducted a comprehensive review of 82 NLP methods aimed at identifying SDoH [22]. These methods span various approaches, from rule-based to deep learning-based methods, presenting the identification of SDoH as either a classification problem [16 18 19] or a named entity recognition (NER) problem [15 20 21]. For example, Stemerman et al [18] designed a multi-label classifier to identify six SDoH categories within sentences extracted from clinical notes sourced from the University of North Carolina’s clinical data warehouse.…”

Section: Related Workmentioning

confidence: 99%

“…The approaches range from utilizing rule-based techniques to deep-learning techniques and more recently large language models (LLMs) [16 17]. The problem is often formulated as a classification [16 18 19], entity recognition [20 21] or event extraction problem [15]. Majority of the existing work is limited to a single dataset from a specific institution or domain.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes – A Generalizable Approach across Institutions

Keloth,

Selek,

Chen

et al. 2024

Preprint

View full text Add to dashboard Cite

The consistent and persuasive evidence illustrating the influence of social determinants on health has prompted a growing realization throughout the health care sector that enhancing health and health equity will likely depend, at least to some extent, on addressing detrimental social determinants. However, detailed social determinants of health (SDoH) information is often buried within clinical narrative text in electronic health records (EHRs), necessitating natural language processing (NLP) methods to automatically extract these details. Most current NLP efforts for SDoH extraction have been limited, investigating on limited types of SDoH elements, deriving data from a single institution, focusing on specific patient cohorts or note types, with reduced focus on generalizability. This study aims to address these issues by creating cross-institutional corpora spanning different note types and healthcare systems, and developing and evaluating the generalizability of classification models, including novel large language models (LLMs), for detecting SDoH factors from diverse types of notes from four institutions: Harris County Psychiatric Center, University of Texas Physician Practice, Beth Israel Deaconess Medical Center, and Mayo Clinic. Four corpora of deidentified clinical notes were annotated with 21 SDoH factors at two levels: level 1 with SDoH factor types only and level 2 with SDoH factors along with associated values. Three traditional classification algorithms (XGBoost, TextCNN, Sentence BERT) and an instruction tuned LLM-based approach (LLaMA) were developed to identify multiple SDoH factors. Substantial variation was noted in SDoH documentation practices and label distributions based on patient cohorts, note types, and hospitals. The LLM achieved top performance with micro-averaged F1 scores over 0.9 on level 1 annotated corpora and an F1 over 0.84 on level 2 annotated corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. To foster collaboration, access to partial annotated corpora and models trained by merging all annotated datasets will be made available on the PhysioNet repository.

show abstract

Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation

Raza,

Dolatabadi,

Ondrusek

et al. 2023

BMC Digit Health

View full text Add to dashboard Cite

Background Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and data The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation. Methods An NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods. Results The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions NLP can be used to extract key information, such as SDOH factors from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.

show abstract

Automatic Extraction of Social Determinants of Health from Medical Notes of Chronic Lower Back Pain Patients

Cited by 3 publications

References 55 publications

Discovering Social Determinants of Health from Case Reports using Natural Language Processing: Algorithmic Development and Validation

Discovering Social Determinants of Health from Case Reports using Natural Language Processing: Algorithmic Development and Validation

Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes – A Generalizable Approach across Institutions

Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation

Contact Info

Product

Resources

About