2023
DOI: 10.48550/arxiv.2302.04062
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Machine Learning for Synthetic Data Generation: a Review

Abstract: Data plays a crucial role in machine learning. However, in real-world applications, there are several problems with data, e.g., data are of low quality; a limited number of data points lead to under-fitting of the machine learning model; it is hard to access the data due to privacy, safety and regulatory concerns. Synthetic data generation offers a promising new avenue, as it can be shared and used in ways that real-world data cannot. This paper systematically reviews the existing works that leverage machine l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
27
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(27 citation statements)
references
References 88 publications
0
27
0
Order By: Relevance
“…It is the favored method for increasing a training dataset's amount and diversity to improve DL accuracy. 29,30 It entails that an actual product was not captured for the image dataset but was created by the model based on patterns and features learned from real images. 31 This technology is employed in numerous other industries, including healthcare, 32 commerce, manufacturing, agriculture, 33 and more.…”
Section: Literature Reviewmentioning
confidence: 99%
“…It is the favored method for increasing a training dataset's amount and diversity to improve DL accuracy. 29,30 It entails that an actual product was not captured for the image dataset but was created by the model based on patterns and features learned from real images. 31 This technology is employed in numerous other industries, including healthcare, 32 commerce, manufacturing, agriculture, 33 and more.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Machine learning algorithms often require a substantial amount of labeled data for effective training, but collecting such data in real-world experiments can be time-consuming and costly, which is particularly relevant in atomic BEC experiments. Recently, to tackle the issue of obtaining a costly labeled training set for machine learning algorithms, the use of synthetic data has been proposed [46][47][48]. This is especially attractive as the GPE can generate synthetic BEC images for CNN training with parameters that are similar to those of the experimental setup.…”
Section: Introductionmentioning
confidence: 99%
“…But besides the exploration of these proof of concepts in this eld, research about other use cases is still ongoing. Our use case focuses on the generation of AI narratives, which is still a relatively novel research eld (Lu et al 2023).…”
Section: Introductionmentioning
confidence: 99%