2022
DOI: 10.48550/arxiv.2205.03257
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Synthetic Data -- what, why and how?

Abstract: This explainer document aims to provide an overview of the current state of the rapidly expanding work on synthetic data technologies, with a particular focus on privacy. The article is intended for a non-technical audience, though some formal definitions have been given to provide clarity to specialists. This article is intended to enable the reader to quickly become familiar with the notion of synthetic data, as well as understand some of the subtle intricacies that come with it. We do believe that synthetic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(18 citation statements)
references
References 141 publications
0
18
0
Order By: Relevance
“…Generating synthetic data with privacy guarantees provides a promising alternative, allowing meaningful research to be carried out at scale [15,14,33]. Together with traditional data augmentation techniques (e.g., geometric transformations), these synthetic data could complement real data to dramatically increase the training set of machine learning models.…”
Section: Introductionmentioning
confidence: 99%
“…Generating synthetic data with privacy guarantees provides a promising alternative, allowing meaningful research to be carried out at scale [15,14,33]. Together with traditional data augmentation techniques (e.g., geometric transformations), these synthetic data could complement real data to dramatically increase the training set of machine learning models.…”
Section: Introductionmentioning
confidence: 99%
“…Instead, the necessary data are divided among several institutions, and privacy concerns additionally prevent the merging of these data. Synthetic data from generative models, such as DALL-E 2, show promise for addressing these issues by enabling the creation of data sets that are much larger than those that are currently available and greatly accelerating the development of new deep learning tools for radiology [ 11 , 12 ].…”
Section: Discussionmentioning
confidence: 99%
“…Similar datasets for production network reconstruction are not currently available and, due to the confidential or proprietary nature of such data, its assembly seems unlikely in the near future. The research community should unite to devise strategies to circumvent this issue, possibly by considering the use of synthetic data [74] as an alternative to real data. While synthetic data generation is currently an active and exciting area of research, it is less well-developed for networks than for tabular data and still suffers from either a lack of privacy guarantees (for traditional methods) or a lack of interpretability of the privacy guarantees (for differential privacy).…”
Section: How Can We Learn More?mentioning
confidence: 99%