Training Subset Selection for Weak Supervision

Lang, Hunter; Vijayaraghavan, Aravindan; Sontag, David

doi:10.48550/arxiv.2206.02914

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the semi-supervised setting, many approaches (Dehghani et al, 2017;Lang et al, 2022;Kimura et al, 2018) propose filtering-out or reweighting the teacher's pseudo-labels based on measures of teacher's uncertainty, such as dropout variance, entropy, margin-score, or the cut-statistic. These methods are independent of the student model and can be synergistically combined with our technique.…”

Section: Knowledge Distillation Techniquesmentioning

confidence: 99%

SLaM: Student-Label Mixing for Semi-Supervised Knowledge Distillation

Kontonis¹,

Iliopoulos²,

Trinh³

et al. 2023

Preprint

View full text Add to dashboard Cite

Semi-supervised knowledge distillation is a powerful training paradigm for generating compact and lightweight student models in settings where the amount of labeled data is limited but one has access to a large pool of unlabeled data. The idea is that a large teacher model is utilized to generate "smoothed" pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for semi-supervised knowledge distillation that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called "forward lossadjustment" methods.

show abstract

Section: Knowledge Distillation Techniquesmentioning

confidence: 99%