BACKGROUND
Electronic Health Records (EHRs) have an enormous potential to advance medical research and practice through easily accessible and interpretable EHR-derived databases. Attainability of this potential is limited by issues with data quality and performance assessment.
OBJECTIVE
This review aims to streamline the current best practices on EHR Data Quality and Performance assessments as a replicable standard for researchers in the field.
METHODS
PubMed was systematically searched for original research articles assessing EHR data quality and/or performance from inception until May 7, 2023.
RESULTS
Our search yielded 26 original research articles. Most articles suffered from one or more significant limitations, including incomplete or inconsistent reporting (30%), poor replicability (25%), and lacking generalizability of results (25%). Completeness (81%), Conformance (69%), and Plausibility (62%) were the most cited indicators of Data Quality, while Correctness/Accuracy (54%) was most cited for Data Performance, with context-specific supplementation by Recency (27%), Fairness (23%), Stability (15%), and Shareability (8%) assessments. Artificial Intelligence (AI)-based techniques including natural language data extraction, data imputation, and fairness algorithms were demonstrated to play a rising role in improving both dataset quality and performance.
CONCLUSIONS
This review highlights the need for incentivizing data quality and performance assessments and their standardization. The results suggest utility of the adoption of AI-based techniques for enhancing data quality and performance to unlock the full potential of EHRs to improve medical research and practice.