2018
DOI: 10.1016/j.jbi.2017.12.017
|View full text |Cite
|
Sign up to set email alerts
|

Yield and bias in defining a cohort study baseline from electronic health record data

Abstract: Aims Despite growing interest in using electronic health records (EHR) to create longitudinal cohort studies, the distribution and missingness of EHR data might introduce selection bias and information bias to such analyses. We aimed to examine the yield and potential for these healthcare process biases in defining a study baseline using EHR data, using the example of cholesterol and blood pressure (BP) measurements. Methods We created a virtual cohort study of cardiovascular disease (CVD) from patients with… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 40 publications
0
10
0
Order By: Relevance
“…We created a prospective cohort study of incident ASCVD events from national VA data, using previously described methods. 11 In brief, this cohort included all VA patients aged at least 18 years with at least 1 primary care visit at a VA facility who had at least 1 outpatient lipid result between 2002 and 2007 and a blood pressure measurement within 30 days of this index lipid testing, a criterion based on our prior work demonstrating that such a restriction limits potential bias within this cohort. 11 The date of index lipid determination served as the baseline date of entry to the cohort.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We created a prospective cohort study of incident ASCVD events from national VA data, using previously described methods. 11 In brief, this cohort included all VA patients aged at least 18 years with at least 1 primary care visit at a VA facility who had at least 1 outpatient lipid result between 2002 and 2007 and a blood pressure measurement within 30 days of this index lipid testing, a criterion based on our prior work demonstrating that such a restriction limits potential bias within this cohort. 11 The date of index lipid determination served as the baseline date of entry to the cohort.…”
Section: Methodsmentioning
confidence: 99%
“…11 In brief, this cohort included all VA patients aged at least 18 years with at least 1 primary care visit at a VA facility who had at least 1 outpatient lipid result between 2002 and 2007 and a blood pressure measurement within 30 days of this index lipid testing, a criterion based on our prior work demonstrating that such a restriction limits potential bias within this cohort. 11 The date of index lipid determination served as the baseline date of entry to the cohort. Individuals were excluded from the cohort if they had a history of HIV infection, cancer, significant kidney or liver disease, schizophrenia, or dementia at baseline (eTable 1 in the Supplement).…”
Section: Populationmentioning
confidence: 99%
“…There are few data-driven approaches that search for multiple variables of diverse organ systems including the kidney, bone, and liver, that might be associated with mortality in a general and otherwise healthy population ( 5 , 6 ). Furthermore, investigations that utilize administrative data (e.g., electronic health records, insurance claims) may be fraught with selection bias (e.g., administrative samples may have a higher prevalence of unhealthy individuals than noninstitutionalized populations) ( 7 9 ). Over the past few decades, the challenges of “over testing” and screening in specific use-cases have rightly been considered ( 10 13 ).…”
Section: Introductionmentioning
confidence: 99%
“…Some commercial entities offer similar services, including Mendel.AI and Deep6AI, though peer-reviewed evidence of their development and performance metrics is unavailable, raising questions about how these approaches perform [ 21 , 22 ]. A potential opportunity of this approach is that it allows trialists to avoid relying on the completeness of structured data fields for participant identification, which has been shown to significantly bias trial cohorts [ 23 , 24 ]. Unfortunately, to the extent that novel ML approaches to patient identification rely on EHRs, biases in the EHR data may affect the algorithms’ performances, leading to replacement of one source of bias (underlying the completeness of structured data) with another (underlying the generation of EHR documentation).…”
Section: The Role Of ML In Clinical Trial Participant Managementmentioning
confidence: 99%