Background
Electronic medical record (EMR)–based clinical and epidemiological research has dramatically increased over the last decade, although establishing the generalizability of such big databases for conducting epidemiological studies has been an ongoing challenge. To draw meaningful inferences from such studies, it is essential to fully understand the characteristics of the underlying population and potential biases in EMRs.
Objective
This study aimed to assess the generalizability and representativity of the widely used US Centricity Electronic Medical Record (CEMR), a primary and ambulatory care EMR for population health research, using data from the National Ambulatory Medical Care Surveys (NAMCS) and the National Health and Nutrition Examination Surveys (NHANES).
Methods
The number of office visits reported in the NAMCS, designed to meet the need for objective and reliable information about the provision and the use of ambulatory medical care services, was compared with similar data from the CEMR. The distribution of major cardiometabolic diseases in the NHANES, designed to assess the health and nutritional status of adults and children in the United States, was compared with similar data from the CEMR.
Results
Gender and ethnicity distributions were similar between the NAMCS and the CEMR. Younger patients (aged <15 years) were underrepresented in the CEMR compared with the NAMCS. The number of office visits per 100 persons per year was similar: 277.9 (95% CI 259.3-296.5) in the NAMCS and 284.6 (95% CI 284.4-284.7) in the CEMR. However, the number of visits for males was significantly higher in the CEMR (CEMR: 270.8 and NAMCS: 239.0). West and South regions were underrepresented and overrepresented, respectively, in the CEMR. The overall prevalence of diabetes along with age and gender distribution was similar in the CEMR and the NHANES: overall prevalence, 10.1% and 9.7%; male, 11.5% and 10.8%; female, 9.1% and 8.8%; age 20 to 40 years, 2.5% and 1.8%; and age 40 to 60 years, 9.4% and 11.1%, respectively. The prevalence of obesity was similar: 42.1% and 39.6%, with similar age and female distribution (41.5% and 41.1%) but different male distribution (42.7% and 37.9%). The overall prevalence of high cholesterol along with age and female distribution was similar in the CEMR and the NHANES: overall prevalence, 12.4% and 12.4%; and female, 14.8% and 13.2%, respectively. The overall prevalence of hypertension was significantly higher in the CEMR (33.5%) than in the NHANES (95% CI: 27.0%-31.0%).
Conclusions
The distribution of major cardiometabolic diseases in the CEMR is comparable with the national survey results. The CEMR represents the general US population well in terms of office visits and major chronic conditions, whereas the potential subgroup differences in terms of age and gender distribution and prevalence may differ and, therefore, should be carefully taken care of in future studies.