2023
DOI: 10.1186/s12874-023-01869-w
|View full text |Cite
|
Sign up to set email alerts
|

A method for generating synthetic longitudinal health data

Abstract: Getting access to administrative health data for research purposes is a difficult and time-consuming process due to increasingly demanding privacy regulations. An alternative method for sharing administrative health data would be to share synthetic datasets where the records do not correspond to real individuals, but the patterns and relationships seen in the data are reproduced. This paper assesses the feasibility of generating synthetic administrative health data using a recurrent deep learning model. Our da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 62 publications
0
7
0
Order By: Relevance
“…The analytical conclusions were the same for the models developed using the pooled partially synthetic dataset as the ground truth model developed using federated analysis in various analytical steps including descriptive, univariable analysis and multivariable main effects and country interaction models. While previous observational studies have compared synthetic and real data 34 36 , there has been no population-based study testing the use of SDG for pooling datasets across jurisdictions and comparing it to a federated approach.…”
Section: Discussionmentioning
confidence: 99%
“…The analytical conclusions were the same for the models developed using the pooled partially synthetic dataset as the ground truth model developed using federated analysis in various analytical steps including descriptive, univariable analysis and multivariable main effects and country interaction models. While previous observational studies have compared synthetic and real data 34 36 , there has been no population-based study testing the use of SDG for pooling datasets across jurisdictions and comparing it to a federated approach.…”
Section: Discussionmentioning
confidence: 99%
“…Note that other models have been developed which are able to generate EHRs with static attributes and sequential data [9][10][11]29]. We opt for DoppelGANger and CPAR since they are contained in easy-to-use open-source libraries, promoting reproducibility of this research.…”
Section: Synthetic Data Generating Modelsmentioning
confidence: 99%
“…, such that approximating p(z|X) can be framed as a simple binary classification task, since the optimization problems in Equation (11) and Equation ( 12) are equivalent. In this case, note that we make a continuous approximation of the binary latent variable on the (0,1) interval.…”
Section: Goodness-of-fitmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, the methods proposed for generating synthetic medical data differ. For example, generational neural diffusion models, variation autoencoders, and a generative adversarial network (GAN) are mainly used to develop synthetic medical images/data [ 6 , 7 , 8 ], whereas algorithms such as Bayesian networks [ 9 ] and classification and regression trees [ 10 , 11 ] are used to develop numerical (quantitative) and non-numerical (qualitative), and recurrent deep learning models are used to build time-series databases [ 12 ]. Because this study aims to develop a numerical database, the issues of generating images and developing non-numerical databases are beyond the scope of this study.…”
Section: Related Workmentioning
confidence: 99%