2022
DOI: 10.48550/arxiv.2207.00099
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Measuring Forgetting of Memorized Training Examples

Abstract: Machine learning models exhibit two seemingly contradictory phenomena: training data memorization and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what extent models "forget" the specifics of training examples, becoming less susceptible to privacy attacks on examples… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 48 publications
0
2
0
Order By: Relevance
“…Memorization in LLMs has been studied extensively as it comes with significant privacy and legal concerns (Carlini et al, 2021;2022b;Jagielski et al, 2022;Tirumala et al, 2022). Pretrained LLMs have a tendency to regurgitate verbatim some of the training data, and their memorization is proportional to their number of parameters (Carlini et al, 2021;Lee et al, 2021).…”
Section: Memorization In Large Language Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Memorization in LLMs has been studied extensively as it comes with significant privacy and legal concerns (Carlini et al, 2021;2022b;Jagielski et al, 2022;Tirumala et al, 2022). Pretrained LLMs have a tendency to regurgitate verbatim some of the training data, and their memorization is proportional to their number of parameters (Carlini et al, 2021;Lee et al, 2021).…”
Section: Memorization In Large Language Modelsmentioning
confidence: 99%
“…This memorization, albeit unintended, is crucial to model natural language, and for generalisation (Zhang et al, 2021b;Feldman, 2020). A range of metrics have been proposed to compute the level of memorization across data points in causal language modelling (Tirumala et al, 2022;Carlini et al, 2022b;Jagielski et al, 2022), based on the likelihood of emitting textual sequences verbatim.…”
Section: Memorization In Large Language Modelsmentioning
confidence: 99%