2018
DOI: 10.1101/275743
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Latent-Based Imputation of Laboratory Measures from Electronic Health Records: Case for Complex Diseases

Abstract: Abstract:Imputation is a key step in Electronic Health Records-mining as it can significantly affect the conclusions derived from the downstream analysis. There are three main categories that explain the missingness in clinical settings ̶ incompleteness, inconsistency, and inaccuracy ̶ and these can capture a variety of situations: the patient did not seek treatment, the health care provider did not enter the information, etc. We used EHR data from patients diagnosed with Inflammatory Bowel Disease from Geisin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 13 publications
0
5
0
Order By: Relevance
“…We are expanding the GNSIS dataset by incorporating a larger number of laboratory-based features; unstructured data from clinical notes such as signs and symptoms during the initial phases of patient evaluation; information about stroke subtypes; and genetic information from a subset of patients enrolled in the MyCode initiative [ 34 ]. We are also expanding our modeling strategies by (1) improving the imputation for laboratory values for EHR-mining [ 35 , 36 ], which could improve patient representation and reduce algorithmic bias; (2) applying natural language processing to expand the feature set from clinical notes; (3) developing polygenic risk score [ 37 ] using genetic information from a subset of our GNSIS cohort; (4) improving model parameter optimization using sensitivity analysis (SA)-based approaches [ 38 , 39 , 40 , 41 ]; and (5) expanding the study by incorporating more advanced methodologies, including deep learning models to compare with binary classification developed in this study. Finally, we are planning on developing models that account for the competing risk of death and other major vascular events in addition to ischemic stroke.…”
Section: Discussionmentioning
confidence: 99%
“…We are expanding the GNSIS dataset by incorporating a larger number of laboratory-based features; unstructured data from clinical notes such as signs and symptoms during the initial phases of patient evaluation; information about stroke subtypes; and genetic information from a subset of patients enrolled in the MyCode initiative [ 34 ]. We are also expanding our modeling strategies by (1) improving the imputation for laboratory values for EHR-mining [ 35 , 36 ], which could improve patient representation and reduce algorithmic bias; (2) applying natural language processing to expand the feature set from clinical notes; (3) developing polygenic risk score [ 37 ] using genetic information from a subset of our GNSIS cohort; (4) improving model parameter optimization using sensitivity analysis (SA)-based approaches [ 38 , 39 , 40 , 41 ]; and (5) expanding the study by incorporating more advanced methodologies, including deep learning models to compare with binary classification developed in this study. Finally, we are planning on developing models that account for the competing risk of death and other major vascular events in addition to ischemic stroke.…”
Section: Discussionmentioning
confidence: 99%
“…The encoding matrix was then used to create different levels of data abstraction by retaining only 100 or 1000 of the encoding using the dimensionality reduction technique (Equation (3)) for each dataset. We used these predefined cut-off values based on our preliminary assessment [ 19 ], as well as empirical studies [ 20 , 21 ]. For comparison, the full rank was also used in the modeling.…”
Section: Methodsmentioning
confidence: 99%
“…67,68 Imputation, predicting missing values, also has its unique challenges. Standardized techniques such as the MICE algorithm 69 or novel imputation methods 70 have been proposed. Other challenges in mining the EHR data includes: 1) different protocols and changes are introduced at various time period, without documentation for the research team; and 2) policy changes and reimbursement rules are introduced that may affect how patients seek care and how the treatment is re-designed based on their needs and their insurance coverage.…”
Section: Challenges and Perspectivesmentioning
confidence: 99%