2023
DOI: 10.48550/arxiv.2301.13304
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Understanding Self-Distillation in the Presence of Label Noise

Abstract: Self-distillation (SD) is the process of first training a "teacher" model and then using its predictions to train a "student" model with the same architecture. Specifically, the student's objective function is ξ * (teacher's predictions, student's predictions) + (1 − ξ) * (given labels, student's predictions) , where is some loss function and ξ is some parameter ∈ [0, 1]. Empirically, SD has been observed to provide performance gains in several settings. In this paper, we theoretically characterize the effect … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 10 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?