2017
DOI: 10.1101/167858
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Characterizing and Managing Missing Structured Data in Electronic Health Records

Abstract: Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR) based analyses. Failure to appropriately consider missing data can lead to biased results. Here, we provide detailed procedures for when and how to conduct imputation of EHR data. We demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered. We analyzed clinical la… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2025
2025

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…We provide the source code to reproduce this work in our repository on GitHub (GitHub, Inc) [ 10 ] under a permissive open source license. In addition, we used continuous analysis [ 11 ] to generate Docker Hub (Docker Inc) images matching the environment of the original analysis and to create intermediate results and logs.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We provide the source code to reproduce this work in our repository on GitHub (GitHub, Inc) [ 10 ] under a permissive open source license. In addition, we used continuous analysis [ 11 ] to generate Docker Hub (Docker Inc) images matching the environment of the original analysis and to create intermediate results and logs.…”
Section: Methodsmentioning
confidence: 99%
“…In addition, we used continuous analysis [ 11 ] to generate Docker Hub (Docker Inc) images matching the environment of the original analysis and to create intermediate results and logs. These artifacts are freely available [ 12 ].…”
Section: Methodsmentioning
confidence: 99%
“…In a recent study, 12 different imputation techniques that were applied to laboratory measures from EHR were compared. In general, authors found that Multivariate Imputation by Chained Equations (MICE) and softImpute consistently imputed missing values with low error [6]; however, in that study, analysis was restricted to 28 most commonly available variables. In another study, authors assessed the different causes of missing data in the EHR data and identified these causes to be the source of unintentional bias [7].…”
Section: Introductionmentioning
confidence: 89%
“…Second, considerations for the choice of imputation strategy need to be made. While there is no such thing as best imputation method, comparisons of error, bias, and implementation difficulty can be leveraged in making a knowledgeable choice[60]. This, however, comes with the caveat that such conclusions may not be generalizable between different datasets.…”
Section: Challengesmentioning
confidence: 99%