Background
Understanding the validity of data from electronic data research networks is critical to national research initiatives and learning healthcare systems for cardiovascular care. Our goal was to evaluate the degree of agreement of electronic data research networks compared with data collected by standardized research approaches in a cohort study.
Methods
We linked individual-level data from The Multi-Ethnic Study of Atherosclerosis (MESA), a community-based cohort, with HealthLNK, a 2006–2012 database of electronic health records (EHRs) from six, Chicago health systems. To evaluate the correlation and agreement of blood pressure (BP) in HealthLNK as compared with in-person MESA examinations, and BMI in HealthLNK compared with MESA, we used Pearson Correlation Coefficients and Bland-Altman plots. Using diagnoses in MESA as the criterion standard, we calculated the performance of HealthLNK for hypertension (HTN), obesity, and diabetes diagnosis using ICD-9 codes and clinical data. We also identified potential myocardial infarctions (MIs), strokes, and heart failure events in HealthLNK and compared them with adjudicated events in MESA.
Results
Of the 1,164 MESA participants enrolled at the Chicago Field Center, 802 (68.9%) participants had data in HealthLNK. The correlation was low for systolic BP (0.39; P<0.0001). Compared with MESA, HealthLNK overestimated systolic BP by 6.5 mmHg (95%CI: 4.2, 7.8). There was a high correlation between BMI in MESA and HealthLNK (0.94; P<0.0001). HealthLNK underestimated BMI by 0.3 kg/m2 (95%CI: −0.4, −0.1). Using ICD-9 codes and clinical data, the sensitivity and specificity of HealthLNK queries for HTN were 82.4% and 59.4%, for obesity were 73.0% and 89.8%, and for diabetes were 79.8% and 93.3%. Compared with adjudicated CVD events in MESA, the concordance rates for MI, stroke, and heart failure were, respectively, 41.7% (5/12), 61.5% (8/13), and 62.5% (10/16).
Conclusions
These findings illustrate the limitations and strengths of electronic data repositories compared with information collected by traditional standardized epidemiologic approaches for the ascertainment of CVD risk factors and events.