2021
DOI: 10.3389/fpubh.2021.697501
|View full text |Cite
|
Sign up to set email alerts
|

Measuring the Value of a Practical Text Mining Approach to Identify Patients With Housing Issues in the Free-Text Notes in Electronic Health Record: Findings of a Retrospective Cohort Study

Abstract: Introduction: Despite the growing efforts to standardize coding for social determinants of health (SDOH), they are infrequently captured in electronic health records (EHRs). Most SDOH variables are still captured in the unstructured fields (i.e., free-text) of EHRs. In this study we attempt to evaluate a practical text mining approach (i.e., advanced pattern matching techniques) in identifying phrases referring to housing issues, an important SDOH domain affecting value-based healthcare providers, using EHR of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 44 publications
0
4
0
Order By: Relevance
“…Measurement error may explain the mixed findings of independent associations between some SDH and postsepsis outcomes reported among our studies. Because data related to SDH are often not accessible through structured data fields, but embedded in free-text fields (61), continued development and application of novel clinical natural language processing (NLP) methods are needed to harness valuable SDH data from unstructured EHR data (62, 63).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Measurement error may explain the mixed findings of independent associations between some SDH and postsepsis outcomes reported among our studies. Because data related to SDH are often not accessible through structured data fields, but embedded in free-text fields (61), continued development and application of novel clinical natural language processing (NLP) methods are needed to harness valuable SDH data from unstructured EHR data (62, 63).…”
Section: Discussionmentioning
confidence: 99%
“…Measurement error may explain the mixed findings of independent associations between some SDH and postsepsis outcomes reported among our studies. Because data (62,63). However, even with improved extraction techniques such as NLP, EHR-derived SDH data are not likely sufficient to constitute a complete and accurate set of SDH domains, as many social and behavioral determinants that may influence health and mortality such as living arrangement and economic stability are not reliably captured and recorded (64).…”
Section: Measurement Errormentioning
confidence: 99%
“…Although the relevance of this Social Determinant of Health (SDOH) [5] has been identified and studied at multiple health levels (i.e., mental health [6], health inequality [7], or self-rated health [8]), its study is overshadowed by other SDOH including gender, race, or ethnicity [9, 10]; compromising its content [11], variability [12] and quality [13]. Moreover, most SDOH are stored as free-text unstructured data [14] making them difficult to handle and use.…”
Section: Introductionmentioning
confidence: 99%
“…Other articles (n = 7) describe rule-based systems paired with traditional machine learning approaches i.e., an ensemble, particularly using NLP systems such as General Architecture for Text Engineering (GATE), Clinical Language Annotation, Modeling, and Processing Toolkit (CLAMP), Extract SDOH from EHRs (EASE), Yale clinical Text Analysis and Knowledge Extraction System (cTAKES), Relative Housing Stability in Electronic Documentation (ReHouSED), and toolkits such as spaCy and medspaCy in conjunction with conditional random fields and support vector machines [110-113]. In contrast, several investigators have leveraged opensource NLP toolkits like spaCy and medspaCy without supervised learners to extractSDoH variables[114][115][116]. Other studies (n = 19) have solely leveraged traditional supervised and unsupervised learning techniques, support vector machines (SVM), logistic regression (LR), NaĂŻve Bayes, Adaboost, Random Forest, XGBoost, Bio-ClinicalBERT, Latent Dirichlet Allocation (LDA), and bidirectional Long Short-Term Memory (BI-LSTM)[16,[117][118][119][120][121] to extract and standardize social and behavioral determinants of health (SBDoH), e.g., alcohol abuse, drug use, sexual orientation, homelessness,substance use, sexual history, HIV status, drug use, housing status, transportation needs, housing insecurity, food insecurity, financial insecurity, employment/income insecurity, insurance insecurity, and poor social support.…”
mentioning
confidence: 99%