Objective:Harmonized data quality (DQ) assessment terms, methods, and reporting practices can establish a common understanding of the strengths and limitations of electronic health record (EHR) data for operational analytics, quality improvement, and research. Existing published DQ terms were harmonized to a comprehensive unified terminology with definitions and examples and organized into a conceptual framework to support a common approach to defining whether EHR data is ‘fit’ for specific uses.Materials and Methods:DQ publications, informatics and analytics experts, managers of established DQ programs, and operational manuals from several mature EHR-based research networks were reviewed to identify potential DQ terms and categories. Two face-to-face stakeholder meetings were used to vet an initial set of DQ terms and definitions that were grouped into an overall conceptual framework. Feedback received from data producers and users was used to construct a draft set of harmonized DQ terms and categories. Multiple rounds of iterative refinement resulted in a set of terms and organizing framework consisting of DQ categories, subcategories, terms, definitions, and examples. The harmonized terminology and logical framework’s inclusiveness was evaluated against ten published DQ terminologies.Results:Existing DQ terms were harmonized and organized into a framework by defining three DQ categories: (1) Conformance (2) Completeness and (3) Plausibility and two DQ assessment contexts: (1) Verification and (2) Validation. Conformance and Plausibility categories were further divided into subcategories. Each category and subcategory was defined with respect to whether the data may be verified with organizational data, or validated against an accepted gold standard, depending on proposed context and uses. The coverage of the harmonized DQ terminology was validated by successfully aligning to multiple published DQ terminologies.Discussion:Existing DQ concepts, community input, and expert review informed the development of a distinct set of terms, organized into categories and subcategories. The resulting DQ terms successfully encompassed a wide range of disparate DQ terminologies. Operational definitions were developed to provide guidance for implementing DQ assessment procedures. The resulting structure is an inclusive DQ framework for standardizing DQ assessment and reporting. While our analysis focused on the DQ issues often found in EHR data, the new terminology may be applicable to a wide range of electronic health data such as administrative, research, and patient-reported data.Conclusion:A consistent, common DQ terminology, organized into a logical framework, is an initial step in enabling data owners and users, patients, and policy makers to evaluate and communicate data quality findings in a well-defined manner with a shared vocabulary. Future work will leverage the framework and terminology to develop reusable data quality assessment and reporting methods.
This study aims to predict death after COVID-19 using only the past medical information routinely collected in electronic health records (EHRs) and to understand the differences in risk factors across age groups. Combining computational methods and clinical expertise, we curated clusters that represent 46 clinical conditions as potential risk factors for death after a COVID-19 infection. We trained age-stratified generalized linear models (GLMs) with component-wise gradient boosting to predict the probability of death based on what we know from the patients before they contracted the virus. Despite only relying on previously documented demographics and comorbidities, our models demonstrated similar performance to other prognostic models that require an assortment of symptoms, laboratory values, and images at the time of diagnosis or during the course of the illness. In general, we found age as the most important predictor of mortality in COVID-19 patients. A history of pneumonia, which is rarely asked in typical epidemiology studies, was one of the most important risk factors for predicting COVID-19 mortality. A history of diabetes with complications and cancer (breast and prostate) were notable risk factors for patients between the ages of 45 and 65 years. In patients aged 65–85 years, diseases that affect the pulmonary system, including interstitial lung disease, chronic obstructive pulmonary disease, lung cancer, and a smoking history, were important for predicting mortality. The ability to compute precise individual-level risk scores exclusively based on the EHR is crucial for effectively allocating and distributing resources, such as prioritizing vaccination among the general population.
Many of the symptoms characterized as the post-acute sequelae of SARS-CoV-2 infection (PASC) could have multiple causes or similarly seen in non-COVID patients. An accurate identification of phenotypes will be important to guide future research and the healthcare system to focus its efforts and resources on adequately controlled age- and gender-specific sequelae of COVID-19 infection. This retrospective electronic health records (EHR) cohort study, we applied a computational framework for knowledge discovery from clinical data, MLHO, to identify phenotypes that positively associate with a past positive PCR test for COVID-19. We evaluated the post-test phenotypes in two temporal windows at 3-6 and 6-9 months after the test and by age and gender. We utilized longitudinal diagnosis records stored in EHRs from Mass General Brigham (MGB) 57 thousand patients who tested positive or negative for COVID-19 and were not hospitalized. Statistical analyses were performed on data from March 2020 to March 2021. PCR test results and subsequent diagnosis records that were recorded for the first time two months or later after the PCR test. We identified 28 phenotypes among different age/gender cohorts or time windows that positively associated with a past SARS-CoV-2 infection. All identified phenotypes were newly recorded in patients’ medical records two months or longer after a COVID-19 PCR test in non-hospitalized patients regardless of the test result. Among these phenotypes, a new diagnosis record for anosmia and dysgeusia (OR 2.17, 95% CI [1.42 - 3.25]), alopecia (OR 3.54, 95% CI [2.92 - 4.3]), chest pain (OR 1.35, 95% CI [1.16 - 1.56]), or chronic fatigue syndrome (OR 1.81-2.28, 95% CI [1.38 - 3.68]) are the most significant indicators of a past COVID-19 infection, especially among women younger than 65. Among men, edema (OR 1.83, 95% CI [1.23 - 2.66]) and disease of nail (OR 3.54, 95% CI [1.63 - 7.29]) in patients 65 and older or proteinuria (OR 2.66, 95% CI [1.61 - 4.34]) in patients under 65 are associated with a positive COVID-19 PCR test in the past few months. Our approach avoids a flood of false positive discoveries, while offering a more probabilistic flexible criterion than the standard linear phenome-wide association study (PheWAS). These findings suggest that some of the previously identified post sequelae of COVID-19 may not be accurate and that most of the PASC are observed in patients under 65 years of age.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.