IntroductionMedicineInsight is a database containing de-identified electronic health records (EHRs) from over 700 Australian general practices. Previous research validated algorithms used to derive medical condition flags in MedicineInsight, but the accuracy of data fields following EHR extractions from clinical practices and data warehouse transformation processes have not been formally validated.
ObjectivesTo examine the accuracy of the extraction and transformation of EHR fields for selected demographics, observations, diagnoses, prescriptions, and tests into MedicineInsight.
MethodsWe benchmarked MedicineInsight values against those recorded in original EHRs. Forty-six general practices contributing data to MedicineInsight met our eligibility criteria, eight were randomly selected, and four agreed to participate. We randomly selected 200 patients ≥ 18 years of age within each participating practice from MedicineInsight. Trained staff reviewed the original EHRs for the selected patients and recorded data from the relevant fields. We calculated the percentage of agreement (POA) between MedicineInsight and EHR data for all fields; Cohen's Kappa for categorical and intra-class correlation (ICC) for continuous measures; and sensitivity, specificity, and positive and negative predictive values (PPV/NPV) for diagnoses.
ResultsA total of 796 patients were included in our analysis. All demographic characteristics, observations, diagnoses, prescriptions and random pathology test results had excellent (> 90%) POA, Kappa, and ICC. POA for most recent pathology/imaging test was moderate (81%, [95% CI: 78% to 84%]). Sensitivity, specificity, PPV, and NPV were excellent (> 90%) for all but one of the examined diagnoses which had a poor PPV.
ConclusionsOverall, our study shows good agreement between the majority of MedicineInsight data and those from original EHRs, suggesting MedicineInsight data extraction and warehousing procedures accurately conserve the data in these key fields. Discrepancies between test data may have arisen due to how data from pathology, radiology and other imaging providers are stored in EHRs and MedicineInsight and this requires further investigation.