2020
DOI: 10.1016/j.neucom.2019.12.136
|View full text |Cite
|
Sign up to set email alerts
|

Generation and evaluation of privacy preserving synthetic health data

Abstract: We develop metrics for measuring the quality of synthetic health data for both education and research. We use novel and existing metrics to capture a synthetic dataset's resemblance, privacy, utility and footprint. Using these metrics, we develop an end-to-end workflow based on our generative adversarial network (GAN) method, HealthGAN, that creates privacy preserving synthetic health data. Our workflow meets privacy specifications of our data partner: (1) the HealthGAN is trained inside a secure environment; … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
123
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 107 publications
(125 citation statements)
references
References 20 publications
0
123
0
2
Order By: Relevance
“…Jackson and Lussetti tested medGAN on an extended dataset containing demographic and health system usage information, obtaining results similar to the original (Jackson and Lussetti, 2019). HealthGAN, based on WGAN-GP, includes a data transformation method adapted from the Synthetic Data Vault (Patki et al, 2016) to map categorical features to and from the unit numerical range (Yale et al, 2020).…”
Section: Auto-encoders and Categorical Featuresmentioning
confidence: 99%
See 4 more Smart Citations
“…Jackson and Lussetti tested medGAN on an extended dataset containing demographic and health system usage information, obtaining results similar to the original (Jackson and Lussetti, 2019). HealthGAN, based on WGAN-GP, includes a data transformation method adapted from the Synthetic Data Vault (Patki et al, 2016) to map categorical features to and from the unit numerical range (Yale et al, 2020).…”
Section: Auto-encoders and Categorical Featuresmentioning
confidence: 99%
“…Dimensions-wise distribution The real and synthetic data are compared feature-wise according to a variety of methods For example, the Bernoulli success probability for binary features, or the Student T-test for continuous variables, and Pearson Chi-square test for binary variables is used to determine statistical significance (Beaulieu-Jones et al, 2019;Choi et al, 2017a;Chin-Cheong et al, 2019;Baowaly et al, 2019;Baowaly et al, 2018;Ozyigit et al, 2020;Tantipongpipat et al, 2019;Tantipongpipat et al, 2019;Fisher et al, 2019;Wang et al, 2019a;Yale et al, 2019a;Chin-Cheong et al, 2020;Ozyigit et al, 2020).…”
Section: Metric Descriptionmentioning
confidence: 99%
See 3 more Smart Citations