2019
DOI: 10.48550/arxiv.1905.09768
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Zero-shot Knowledge Transfer via Adversarial Belief Matching

Abstract: Performing knowledge transfer from a large teacher network to a smaller student is a popular task in modern deep learning applications. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. We propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata. We achieve this by training an adversarial generator to search for images on which the st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(22 citation statements)
references
References 12 publications
0
22
0
Order By: Relevance
“…One drawback of FedDF and FedBE is that the aggregation server needs to have its own unlabeled dataset. However, this can be overcome using zero-shot learning, where synthetic data can be generated for this purpose Micaelli and Storkey [2019], Nayak et al…”
Section: Model Poising Attacksmentioning
confidence: 99%
“…One drawback of FedDF and FedBE is that the aggregation server needs to have its own unlabeled dataset. However, this can be overcome using zero-shot learning, where synthetic data can be generated for this purpose Micaelli and Storkey [2019], Nayak et al…”
Section: Model Poising Attacksmentioning
confidence: 99%
“…Baselines. Two types of DFKD methods are compared in our experiments: (1) generative methods that train a generative model for synthesis, including DAFL (Chen et al 2019), ZSKT (Micaelli and Storkey 2019), DFQ (Choi et al 2020), and Generative DFD (Luo et al 2020) (2) non-generative methods that craft transfer set in a batch-by-batch manner including DeepInv (Yin et al 2019) and CMI (Fang et al 2021b).…”
Section: Experimental Settingsmentioning
confidence: 99%
“…Nayak et al [20] proposed a similar scheme but with a zero-shot approach by modeling the output space of the teacher model as a Dirichlet distribution, which is taken from model weights. More recent studies have employed generator architectures similar to GAN [21] to generate synthetic samples replacing the original data [22,23,24,25]. In the absence of the original data for training, DAFL [22] used the teacher model to replace the discriminator by encouraging the outputs to be close to a one-hot distribution and by maximizing the activation counts.…”
Section: Data-free Compressionmentioning
confidence: 99%
“…KegNet [26] adopted a similar idea and used a low-rank decomposition to aid the compression. Adversarial belief matching [23] and data-free adversarial distillation [24] methods suggested adversarially training the generator, such that the generated samples become harder to train. One other variant is to modify samples directly using logit maximization as in DeepInversion [25].…”
Section: Data-free Compressionmentioning
confidence: 99%