2014
DOI: 10.1016/j.jbi.2014.03.016
|View full text |Cite
|
Sign up to set email alerts
|

Identifying and mitigating biases in EHR laboratory tests

Abstract: Electronic health record (EHR) data show promise for deriving new ways of modeling human disease states. Although EHR researchers often use numerical values of laboratory tests as features in disease models, a great deal of information is contained in the context within which a laboratory test is taken. For example, the same numerical value of a creatinine test has different interpretation for a chronic kidney disease patient and a patient with acute kidney injury. We study whether EHR research studies are sub… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

5
63
1

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 86 publications
(69 citation statements)
references
References 31 publications
5
63
1
Order By: Relevance
“…In contrast, all of the data in the STRIDE database that were used in our study is from current patients, whether in an inpatient or an outpatient setting. The different contexts in which the laboratory tests were performed could explain the difference in the reference intervals found in our study and in these previous studies [44]. …”
Section: Discussioncontrasting
confidence: 78%
“…In contrast, all of the data in the STRIDE database that were used in our study is from current patients, whether in an inpatient or an outpatient setting. The different contexts in which the laboratory tests were performed could explain the difference in the reference intervals found in our study and in these previous studies [44]. …”
Section: Discussioncontrasting
confidence: 78%
“…The relationship we observed between the measurements of a laboratory value and a vital sign are consistent with those of Pivovarov and colleagues, who found that, among 20 years of EHR data from 14,000 ambulatory internal medicine patients, the testing patterns of certain laboratory tests conveyed separate information from the test results’ numerical values themselves. [29] Interestingly, however, the numerical value of LDL cholesterol testing was not associated with its frequency of testing in that study, a finding the authors attributed to healthcare processes such as guidelines for screening and monitoring. Our analyses of outpatient measurements are also consistent with results from 10,000 patients receiving anesthetic services at one medical center, where illness severity was associated with a greater number of days with clinical data[30].…”
Section: Discussionmentioning
confidence: 76%
“…[22, 23, 32, 33] Pivovarov and colleagues used histograms to examine laboratory testing dynamics; in doing so, they identified multimodality in testing patterns associated with, for example, inpatient versus outpatient status. [29] Baseline measurements of cholesterol and BP are central to CVD prediction in the virtual cohort study we are creating. Our V-shaped plot of the timeframe between these measurements will inform how we examine the impact of data distribution on the analytic validity of our CVD prediction models.…”
Section: Discussionmentioning
confidence: 99%
“…But because all EHR data have missing values in time, an ever-present issue is how to incorporate time [43], a question often addressed by framing the data through the lens of missingness [44–47] or imputation and interpolation. For example, some authors use missingness of data as a feature [48,49,7] that can be used to define phenotypes. But more often researches focus on imputation schemes, or methods for interpolate missing values [50,51,21,52-54].…”
Section: Introductionmentioning
confidence: 99%
“…It has been previously demonstrated that the health care process (Hripcsak and Albers, 2012, 2013), as defined by measurement context (Hripcsak and Albers, 2013; Albers et al, 2012) and measurement patterns (Albers and Hripcsak, 2010, 2012), can influence how EHR data are distributed statistically (Kohane and Weber, 2013; Pivovarov et al, 2014). We construct an algorithm, PopKLD, which is based on information criterion model selection (Burnham and Anderson, 2002; Claeskens and Hjort, 2008), is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary.…”
mentioning
confidence: 99%