2020
DOI: 10.48550/arxiv.2002.06307
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Uncertainties associated with GAN-generated datasets in high energy physics

Konstantin T. Matchev,
Alexander Roman,
Prasanth Shyamsundar

Abstract: Recently, Generative Adversarial Networks (GANs) trained on samples of traditionally simulated collider events have been proposed as a way of generating larger simulated datasets at a reduced computational cost. In this paper we present an argument cautioning against the usage of this method to meet the simulation requirements of an experiment, namely that data generated by a GAN cannot statistically be better than the data it was trained on.

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 43 publications
(62 reference statements)
0
6
0
Order By: Relevance
“…2 A related question is the statistical power of the examples generated from G. See Ref. [70] and [71] for discussions, and the latter paper for an empirical demonstration of this topic. 3 We are neglecting the contribution to the variance from f which should also be estimated from the sample.…”
Section: B Statistical Properties Of Weighted Examplesmentioning
confidence: 99%
“…2 A related question is the statistical power of the examples generated from G. See Ref. [70] and [71] for discussions, and the latter paper for an empirical demonstration of this topic. 3 We are neglecting the contribution to the variance from f which should also be estimated from the sample.…”
Section: B Statistical Properties Of Weighted Examplesmentioning
confidence: 99%
“…Note that although the estimates of F, U, and σ syst /σ stat from a preliminary dataset will not be accurate up to the sensitivity offered by the full MC dataset [50], they will be sufficient for projecting, with sufficient accuracy, the sensitivity of the experiment under different sampling distributions.…”
Section: Oasis For Analysis Variables 31 Groundworkmentioning
confidence: 99%
“…This condition needs to only be satisfied almost everywhere for optimal estimation of F 4. An alternative approach, motivated by recent advances in Machine Learning (ML), replaces the exact distribution f with an (approximate) ML regressor trained on a small sample of events produced from f[47][48][49], leading to an increase in the rate but not necessarily the precision of the simulation[50]. These methods use points in the phase space for which f is computed, and not events sampled from f .…”
mentioning
confidence: 99%
“…A seemingly straightforward and intuitive answer to this question is as many examples as were used for training, because the network does not add any physics knowledge [39]. However, there are reasons to think that a generative model actually contains more statistical power than the original data set.…”
Section: Introductionmentioning
confidence: 99%

GANplifying Event Samples

Butter,
Diefenbacher,
Kasieczka
et al. 2020
Preprint