2020
DOI: 10.1093/jamia/ocaa079
|View full text |Cite
|
Sign up to set email alerts
|

sureLDA: A multidisease automated phenotyping method for the electronic health record

Abstract: Objective A major bottleneck hindering utilization of electronic health record data for translational research is the lack of precise phenotype labels. Chart review as well as rule-based and supervised phenotyping approaches require laborious expert input, hampering applicability to studies that require many phenotypes to be defined and labeled de novo. Though International Classification of Diseases codes are often used as surrogates for true labels in this setting, these sometimes suffer fr… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
35
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 33 publications
(35 citation statements)
references
References 28 publications
0
35
0
Order By: Relevance
“…Topic modeling provides a method for identifying discussion of pseudogout in the EHR by combining information from a wide variety of features, including symptoms (e.g., joint swelling), laboratory tests (e.g., synovial fluid crystal analysis), and medications. We employed a novel topic modeling method, called sureLDA, because this method has recently been shown to work well for phenotyping a host of both acute and chronic diseases from EHR data (14). This method predicts a pseudogout propensity score (sureLDA score) for each of the 30,089 patients.…”
Section: Methodsmentioning
confidence: 99%
“…Topic modeling provides a method for identifying discussion of pseudogout in the EHR by combining information from a wide variety of features, including symptoms (e.g., joint swelling), laboratory tests (e.g., synovial fluid crystal analysis), and medications. We employed a novel topic modeling method, called sureLDA, because this method has recently been shown to work well for phenotyping a host of both acute and chronic diseases from EHR data (14). This method predicts a pseudogout propensity score (sureLDA score) for each of the 30,089 patients.…”
Section: Methodsmentioning
confidence: 99%
“…Then z PHECODE is predicted with by ordinary least squares regression to obtain the regression coefficient vector β . The final PheNorm score of subject i can be obtained by the weighted linear combination of health care utilization and candidate feature sets, with feature vector z i , The actual implementation follows the scripts in R package sureLDA 50 , where random sampling with replacement of observations in each training set is used to form a bootstrap of size 10 5 .…”
Section: Methodsmentioning
confidence: 99%
“…Large portions of them were healthy prior to admission so they had no rich data to mine. Large volume of missing data raises concerns about the reliability of our phenotyping algorithms [16][17][18][19][20][21][22][23][24][25][26][27][28]. In addition, during the surge, many seriously ill patients did not get coded as having an ICU visit (i.e., a major severity indicator) due to the bed shortage.…”
Section: Introduction Backgroundmentioning
confidence: 99%