Leaving Reality to Imagination: Robust Classification via Generated Datasets

Bansal, Hritik; Grover, Aditya

doi:10.48550/arxiv.2302.02503

Search citation statements

Order By: Relevance

Paper Sections

Select...

Introduction1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Today's AI models use Internet-scraped data, and thus unwittingly train on synthetic data (Figure 2). Moreover, AI-synthesized data is increasingly popular [5][6][7][8][9][10] because it is convenient [11,12], anonymous [13][14][15][16], can augment real data [17,18], and can match AI models' ever-increasing sizes [19][20][21].…”

Section: Introductionmentioning

confidence: 99%

Self-Consuming Generative Models go MAD

Casco-Rodriguez,

Alemohammad,

Luzi

et al. 2023

LatinX in AI at Neural Information Processing Systems Conference 2023

View full text Add to dashboard Cite

Seismic advances in generative AI algorithms have led to the temptation to use AI-synthesized data to train next-generation models. Repeating this process creates autophagous (“self-consuming”) loops whose properties are poorly understood. We conduct a thorough analysis using state-of-the-art generative image models of three autophagous loop families that differ in how they incorporate fixed or fresh real training data and whether previous generations' samples have been biased to trade off data quality versus diversity. Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy Disorder (MAD) and show that appreciable MADness arises in just a few generations.

show abstract