Objective COVID-19 poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Methods The Clinical and Translational Science Award (CTSA) Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Organized in inclusive workstreams, in two months we created: legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Discussion The N3C has demonstrated that a multi-site collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multi-organizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19. LAY SUMMARY COVID-19 poses societal challenges that require expeditious data and knowledge sharing. Though medical records are abundant, they are largely inaccessible to outside researchers. Statistical, machine learning, and causal research are most successful with large datasets beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many clinical centers to reveal patterns in COVID-19 patients. To create N3C, the community had to overcome technical, regulatory, policy, and governance barriers to sharing patient-level clinical data. In less than 2 months, we developed solutions to acquire and harmonize data across organizations and created a secure data environment to enable transparent and reproducible collaborative research. We expect the N3C to help save lives by enabling collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care needs and thereby reduce the immediate and long-term impacts of COVID-19.
Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. As the worldwide scientific community forges ahead with efforts to characterize a wide range of outcomes associated with SARS-CoV-2 infection, the proliferation of available data has made it clear that formal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat.
The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and ongoing coronavirus disease 2019 (COVID-19) pandemic underscores the need for new treatments. Here, we report that cannabidiol (CBD) inhibits infection of SARS-CoV-2 in cells and mice. CBD and its metabolite 7-OH-CBD, but not THC or other congeneric cannabinoids tested, potently block SARS-CoV-2 replication in lung epithelial cells. CBD acts after viral entry, inhibiting viral gene expression and reversing many effects of SARS-CoV-2 on host gene transcription. CBD inhibits SARS-CoV-2 replication in part by up-regulating the host IRE1α ribonuclease endoplasmic reticulum (ER) stress response and interferon signaling pathways. In matched groups of human patients from the National COVID Cohort Collaborative, CBD (100 mg/ml oral solution per medical records) had a significant negative association with positive SARS-CoV-2 tests. This study highlights CBD as a potential preventative agent for early-stage SARS-CoV-2 infection and merits future clinical trials. We caution against current use of non-medical formulations as a preventative or treatment therapy.
Background In a multisite clinical research collaboration, institutions may or may not use the same common data model (CDM) to store clinical data. To overcome this challenge, we proposed to use Health Level 7’s Fast Healthcare Interoperability Resources (FHIR) as a meta-CDM—a single standard to represent clinical data. Objective In this study, we aimed to create an open-source application termed the Clinical Asset Mapping Program for FHIR (CAMP FHIR) to efficiently transform clinical data to FHIR for supporting source-agnostic CDM-to-FHIR mapping. Methods Mapping with CAMP FHIR involves (1) mapping each source variable to its corresponding FHIR element and (2) mapping each item in the source data’s value sets to the corresponding FHIR value set item for variables with strict value sets. To date, CAMP FHIR has been used to transform 108 variables from the Informatics for Integrating Biology & the Bedside (i2b2) and Patient-Centered Outcomes Research Network data models to fields across 7 FHIR resources. It is designed to allow input from any source data model and will support additional FHIR resources in the future. Results We have used CAMP FHIR to transform data on approximately 23,000 patients with asthma from our institution’s i2b2 database. Data quality and integrity were validated against the origin point of the data, our enterprise clinical data warehouse. Conclusions We believe that CAMP FHIR can serve as an alternative to implementing new CDMs on a project-by-project basis. Moreover, the use of FHIR as a CDM could support rare data sharing opportunities, such as collaborations between academic medical centers and community hospitals. We anticipate adoption and use of CAMP FHIR to foster sharing of clinical data across institutions for downstream applications in translational research.
Background In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations. Methods We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using four federated Common Data Models. N3C Data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements. Results Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source CDM conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback. Discussion We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for data quality improvement that will support improved research analytics locally and in aggregate. Conclusion By combining rapid, continual assessment of DQ with a large volume of multi-site data, it is possible to support more nuanced scientific questions with the scale and rigor that they require.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.