2020
DOI: 10.1093/jamia/ocaa215
|View full text |Cite
|
Sign up to set email alerts
|

Generative transfer learning for measuring plausibility of EHR diagnosis records

Abstract: Objective Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease. Materials and Methods Using EHR data on 18 diseases from the Mass General Brigham (MGB) Biobank, we de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 14 publications
(15 citation statements)
references
References 28 publications
0
15
0
Order By: Relevance
“…We followed the same analytic process used by Estiri et al (2021) 30 that was used to identify risk factors for COVID-19 mortality from EHR data. From the MLHO framework, the computational process to conduct multivariate PheWAS involved applying the Minimize Sparsity, Maximize Relevance (MSMR) 23,31,32 algorithm, clinical expertise, and multinomial generalized linear modeling (GLM) with component-wise functional gradient boosting, and a composite confidence score to identify the phenotypes that are positively associated with a past PCR test (see eMethods).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We followed the same analytic process used by Estiri et al (2021) 30 that was used to identify risk factors for COVID-19 mortality from EHR data. From the MLHO framework, the computational process to conduct multivariate PheWAS involved applying the Minimize Sparsity, Maximize Relevance (MSMR) 23,31,32 algorithm, clinical expertise, and multinomial generalized linear modeling (GLM) with component-wise functional gradient boosting, and a composite confidence score to identify the phenotypes that are positively associated with a past PCR test (see eMethods).…”
Section: Methodsmentioning
confidence: 99%
“…We followed a similar analytic process used by [31] that was used to identify risk factors for COVID-19 mortality from EHR data. From the MLHO framework, the computational process involved applying the Minimize Sparsity, Maximize Relevance (MSMR) algorithm, [23,32,33] clinical expertise, and multivariate boosting logistic regression, to compute a composite confidence score for identifying the phenotypes that are positively associated with a past RT-PCR test (see eMethods for more details).…”
Section: Mlho Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…Among the methods used to train word embeddings, Word2vec [98] and Bidirectional Encoder Representations from Transformers (BERT) and variants [99][100][101][102] are the most frequently used (Supplementary Material Table S9). Word embeddings typically serve as the input layer to phenotyping algorithms using deep learning models and have also been used to account for ambiguous abbreviations and spelling errors in clinical notes [25,27,29,32,34,42,69,72,[74][75][76][77]83,[93][94][95][96][97][103][104][105][106][107][108][109][110][111][112][113][114].…”
Section: Data Typesmentioning
confidence: 99%
“…Estiri et al utilized a self-learning approach to develop standard generative models (eg. Naive Bayes, Linear Discriminant Analysis) using a small set of labeled data (average 182 patients) and larger set of unlabeled (average 5956 patients) for classification of 18 phenotypes [114]. The approach performed on par with supervised learning, but required less labeled data (AUROC 0.78-0.99).…”
Section: Semi-supervised Learningmentioning
confidence: 99%