2022
DOI: 10.48550/arxiv.2205.10770
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Abstract: Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood. We empirically study exact memorization in causal and masked language modeling, across model sizes and throughout the training process. We measure the effects of dataset size, learning rate, and model size on memorization, finding that larger language models memorize training data faster across all settings. Surprisingly, we show that larger models can memorize a larger portion o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
7
1
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 51 publications
1
11
0
Order By: Relevance
“…Recent study shows that large language models can memorize its training data, and generate texts from training data given certain prompts Kharitonov et al, 2021;Thakkar et al, 2020;Carlini et al, 2019;Tirumala et al, 2022). Most related to our work, Carlini et al (2022) found that the memorization ability of LLMs significantly grows as the model capacity increases, the number of times an example has been duplicated, and the number of tokens of context used to prompt the model.…”
Section: Memorization In Large Language Modelssupporting
confidence: 56%
“…Recent study shows that large language models can memorize its training data, and generate texts from training data given certain prompts Kharitonov et al, 2021;Thakkar et al, 2020;Carlini et al, 2019;Tirumala et al, 2022). Most related to our work, Carlini et al (2022) found that the memorization ability of LLMs significantly grows as the model capacity increases, the number of times an example has been duplicated, and the number of tokens of context used to prompt the model.…”
Section: Memorization In Large Language Modelssupporting
confidence: 56%
“…MA quantifies how much f θ has memorized the given token sequences and was proposed by Tirumala et al (2022) to analyze the training dynamics of large LMs.…”
Section: Extraction Likelihood (El)mentioning
confidence: 99%
“…This definition and variations of it have been used widely in the literature (Kandpal et al, 2022;Lee et al, 2021;Carlini et al, 2022). For example, Tirumala et al (2022) study a similar per-token definition called exact memorization and Kandpal et al (2022) a document-level definition called perfect memorization.…”
Section: Measuring Verbatim Memorizationmentioning
confidence: 99%