2021
DOI: 10.1101/2021.07.06.21259051
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C)

Abstract: ObjectiveTo evaluate whether synthetic data derived from a national COVID-19 data set could be used for geospatial and temporal epidemic analyses.Materials and MethodsUsing an original data set (n=1,854,968 SARS-CoV-2 tests) and its synthetic derivative, we compared key indicators of COVID-19 community spread through analysis of aggregate and zip-code level epidemic curves, patient characteristics and outcomes, distribution of tests by zip code, and indicator counts stratified by month and zip code. Similarity… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…Using synthetic data for model building could lower cost, reduce barriers to entry, ease external validation using datasets from multiple health‐care systems, and facilitate hypotheses generation of disease mechanisms. Various health systems are already using synthetic datasets for quality improvement and medical research 13,14,27,48 …”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Using synthetic data for model building could lower cost, reduce barriers to entry, ease external validation using datasets from multiple health‐care systems, and facilitate hypotheses generation of disease mechanisms. Various health systems are already using synthetic datasets for quality improvement and medical research 13,14,27,48 …”
Section: Discussionmentioning
confidence: 99%
“…Various health systems are already using synthetic datasets for quality improvement and medical research. 13,14,27,48 A study limitation is the predominance of male and White subjects in this cohort with potentially greater exposure to traumatic brain injury and post-traumatic stress disorder in combat veterans and applicability to a more diverse patient population or other health-care systems needs empirical testing. Our prior study showed that the performance of a machine learning model to predict AD onset using blood pressure trajectories trained using VA EHR data was similar when applied to University of Michigan EHR data even though the demographic compositions are different 12 .…”
Section: Conflict Of Interest Statementmentioning
confidence: 99%
See 1 more Smart Citation
“…The VHA data is enriched for individuals whose privacy is a matter of national security concern. Additionally, synthetic COVID-19 patients were demonstrated to be an effective surrogate for a variety of public health and clinical tasks ( 23 , 24 ). Therefore, synthetic data may play a critical role in modeling COVID-19 and developing AI applications in general within the VHA.…”
Section: Introductionmentioning
confidence: 99%