Proceedings of the 14th International Conference on Availability, Reliability and Security 2019
DOI: 10.1145/3339252.3339281
|View full text |Cite
|
Sign up to set email alerts
|

On the Utility of Synthetic Data

Abstract: With the recent advances and increasing activities in data mining and analysis, the protection of the privacy of individuals is crucial. Several approaches address this concern, from techniques like data anonymisation to secure, non-disclosive computation, all of which have their specific strengths and weaknesses, depending on the specific requirements. A slightly different approach is the generation of synthetic data, which tries to preserve the overall properties and characteristics of the original data with… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
40
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 69 publications
(40 citation statements)
references
References 8 publications
0
40
0
Order By: Relevance
“…While real behavior data are valuable and sensitive, in many domains, this type of data is unavailable, which makes the usage of synthetic data solutions attractive (Nikolenko, 2019). In most cases, the models trained on the synthetic data are almost as effective as models trained on the real data (Hittmeir et al, 2019). Synthetic data approaches are popular in healthcare, because of the obvious concerns for data sensitivity and privacy, where the models trained on synthetic data show only small decreases in accuracy when compared to the models trained on the real data (Rankin et al, 2020).…”
Section: The Potential Of Synthetic Data Application In Cdtsmentioning
confidence: 99%
“…While real behavior data are valuable and sensitive, in many domains, this type of data is unavailable, which makes the usage of synthetic data solutions attractive (Nikolenko, 2019). In most cases, the models trained on the synthetic data are almost as effective as models trained on the real data (Hittmeir et al, 2019). Synthetic data approaches are popular in healthcare, because of the obvious concerns for data sensitivity and privacy, where the models trained on synthetic data show only small decreases in accuracy when compared to the models trained on the real data (Rankin et al, 2020).…”
Section: The Potential Of Synthetic Data Application In Cdtsmentioning
confidence: 99%
“…Therefore, it is important to evaluate N3C synthetic data in a manner that can inform users with a wide range of intended use cases and definitions for synthetic data fitness for use. [25] The utility of synthetic health data has been evaluated in other work [15,19,20,[26][27][28][29][30] outside of N3C which applied a variety of the ways one can validate synthetic data. [31] However, N3C synthetic data utility has only been evaluated once before.…”
Section: Background and Significancementioning
confidence: 99%
“…In evaluating multivariate relationships, Rankin et al, 22 Yale et al, 8 Wang et al, 29 Bourou et al, 25 Hittmeir et al, 39 Dankar et al, 41 and Rashidian et al 32 visually compared the Pairwise Pearson Correlation (PPC) matrices to assess whether correlations between attributes of RD are maintained in STD. Additionally, principal component analysis transformation has been used by Yale et al 8 to compare the dimensional properties of STD and RD.…”
Section: Introductionmentioning
confidence: 99%