A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data

Self Cite

Objective: To compare rule-based data quality (DQ) assessment approaches across multiple national clinical data sharing organizations. Methods:Six organizations with established data quality assessment (DQA) programs provided documentation or source code describing current DQ checks. DQ checks were mapped to the categories within the data verification context of the harmonized DQA terminology. To ensure all DQ checks were consistently mapped, conventions were developed and four iterations of mapping performed. Difficultto-map DQ checks were discussed with research team members until consensus was achieved.Results: Participating organizations provided 11,026 DQ checks, of which 99.97 percent were successfully mapped to a DQA category. Of the mapped DQ checks (N=11,023), 214 (1.94 percent) mapped to multiple DQA categories. The majority of DQ checks mapped to Atemporal Plausibility (49.60 percent), Value Conformance (17.84 percent), and Atemporal Completeness (12.98 percent) categories. Discussion: Using the common DQA terminology, near-complete (99.97 percent) coverage across a wide range of DQA programs and specifications was reached. Comparing the distributions of mapped DQ checks revealed important differences between participating organizations. This variation may be related to the organization's stakeholder requirements, primary analytical focus, or maturity of their DQA program. Not within scope, mapping checks within the data validation context of the terminology may provide additional insights into DQA practice differences. Conclusion:A common DQA terminology provides a means to help organizations and researchers understand the coverage of their current DQA efforts as well as highlight potential areas for additional DQA development. Sharing DQ checks between organizations could help expand the scope of DQA across clinical data networks.

Section: Systematized Nomenclature Of Medicine (Snomed) Versus Logicamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Comparison of Data Quality Assessment Checks in Six Data Sharing Networks

Callahan¹,

Bauck²,

Bertoch³

et al. 2017

Self Cite

“…We have also implemented a number of statistical techniques to assess data quality, including conformance, completeness and plausibility [9]. For categorical variables (e.g., race/ethnicity), extent of missing data were evaluated.…”

Section: Step 3 Utilize a Rigorous Approach To Refine Variables And mentioning

confidence: 99%

A Framework for Leveraging “Big Data” to Advance Epidemiology and Improve Quality: Design of the VA Colonoscopy Collaborative

Gupta

Liu

Patterson

et al. 2018

Objective:To describe a framework for leveraging big data for research and quality improvement purposes and demonstrate implementation of the framework for design of the Department of Veterans Affairs (VA) Colonoscopy Collaborative.Methods:We propose that research utilizing large-scale electronic health records (EHRs) can be approached in a 4 step framework: 1) Identify data sources required to answer research question; 2) Determine whether variables are available as structured or free-text data; 3) Utilize a rigorous approach to refine variables and assess data quality; 4) Create the analytic dataset and perform analyses. We describe implementation of the framework as part of the VA Colonoscopy Collaborative, which aims to leverage big data to 1) prospectively measure and report colonoscopy quality and 2) develop and validate a risk prediction model for colorectal cancer (CRC) and high-risk polyps.Results:Examples of implementation of the 4 step framework are provided. To date, we have identified 2,337,171 Veterans who have undergone colonoscopy between 1999 and 2014. Median age was 62 years, and 4.6 percent (n = 106,860) were female. We estimated that 2.6 percent (n = 60,517) had CRC diagnosed at baseline. An additional 1 percent (n = 24,483) had a new ICD-9 code-based diagnosis of CRC on follow up.Conclusion:We hope our framework may contribute to the dialogue on best practices to ensure high quality epidemiologic and quality improvement work. As a result of implementation of the framework, the VA Colonoscopy Collaborative holds great promise for 1) quantifying and providing novel understandings of colonoscopy outcomes, and 2) building a robust approach for nationwide VA colonoscopy quality reporting.

“…Kahn and colleagues have developed a pragmatic framework for data quality assessment in EHRs. Their framework assesses “fitness for use” in health care QI and comparative effectiveness research (CER) [71316]. The National Institutes of Health (NIH) National Healthcare Research Systems Collaboratory has recently added a data quality review criterion for its research projects [17].…”

Section: Introductionmentioning

confidence: 99%

Automating Electronic Clinical Data Capture for Quality Improvement and Research: The CERTAIN Validation Project of Real World Evidence

Devine¹,

Eaton²,

Zadworny³

et al. 2018

Background:The availability of high fidelity electronic health record (EHR) data is a hallmark of the learning health care system. Washington State’s Surgical Care Outcomes and Assessment Program (SCOAP) is a network of hospitals participating in quality improvement (QI) registries wherein data are manually abstracted from EHRs. To create the Comparative Effectiveness Research and Translation Network (CERTAIN), we semi-automated SCOAP data abstraction using a centralized federated data model, created a central data repository (CDR), and assessed whether these data could be used as real world evidence for QI and research.Objectives:Describe the validation processes and complexities involved and lessons learned.Methods:Investigators installed a commercial CDR to retrieve and store data from disparate EHRs. Manual and automated abstraction systems were conducted in parallel (10/2012-7/2013) and validated in three phases using the EHR as the gold standard: 1) ingestion, 2) standardization, and 3) concordance of automated versus manually abstracted cases. Information retrieval statistics were calculated.Results:Four unaffiliated health systems provided data. Between 6 and 15 percent of data elements were abstracted: 51 to 86 percent from structured data; the remainder using natural language processing (NLP). In phase 1, data ingestion from 12 out of 20 feeds reached 95 percent accuracy. In phase 2, 55 percent of structured data elements performed with 96 to 100 percent accuracy; NLP with 89 to 91 percent accuracy. In phase 3, concordance ranged from 69 to 89 percent. Information retrieval statistics were consistently above 90 percent.Conclusions:Semi-automated data abstraction may be useful, although raw data collected as a byproduct of health care delivery is not immediately available for use as real world evidence. New approaches to gathering and analyzing extant data are required.