2020
DOI: 10.48550/arxiv.2009.13270
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation

Abstract: Recent work on the lottery ticket hypothesis has produced highly sparse Transformers for NMT while maintaining BLEU. However, it is unclear how such pruning techniques affect a model's learned representations. By probing sparse Transformers, we find that complex semantic information is first to be degraded. Analysis of internal activations reveals that higher layers diverge most over the course of pruning, gradually becoming less complex than their dense counterparts. Meanwhile, early layers of sparse models b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 22 publications
(35 reference statements)
0
7
0
Order By: Relevance
“…LTH (Frankle and Carbin, 2018) has been widely explored in various applications of deep learning (Brix et al, 2020;Movva and Zhao, 2020;Girish et al, 2020). Most of existing results focus on finding unstructured winning tickets via iterative magnitude pruning and rewinding in randomly initialized networks (Frankle et al, 2019;Renda et al, 2020), where each ticket is a single neuron.…”
Section: Structured and Unstructured Lthsmentioning
confidence: 99%
See 1 more Smart Citation
“…LTH (Frankle and Carbin, 2018) has been widely explored in various applications of deep learning (Brix et al, 2020;Movva and Zhao, 2020;Girish et al, 2020). Most of existing results focus on finding unstructured winning tickets via iterative magnitude pruning and rewinding in randomly initialized networks (Frankle et al, 2019;Renda et al, 2020), where each ticket is a single neuron.…”
Section: Structured and Unstructured Lthsmentioning
confidence: 99%
“…of such a collection of tickets, which is usually referred to as "winning tickets", indicates the potential of training a smaller network to achieve the full model's performance. LTH has been widely explored in across various fields of deep learning (Frankle et al, 2019;You et al, 2019;Brix et al, 2020;Movva and Zhao, 2020;Girish et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…The Transformer architecture (Vaswani et al, 2017) became the backbone of the state-of-the-art models in a variety of tasks Raffel et al, 2019;Adiwardana et al, 2020;Brown et al, 2020). This spurred a significant interest in better understanding inner workings of these models (Vig and Belinkov, 2019;Clark et al, 2019;Kharitonov and Chaabouni, 2020;Hahn, 2020;Movva and Zhao, 2020;Chaabouni et al, 2021;Merrill et al, 2021;Sinha et al, 2021). Most of these works have focussed specifically on how models generalize and capture structure across samples that are similar.…”
Section: Introductionmentioning
confidence: 99%
“…Previous studies [3,17] have shown that pruned neural networks evolve to substantially different representations while striving to preserve overall accuracy. In Section 3, we have demonstrated that knowledge distillation can effectively mitigate both pruning and data induced bias in compressed networks.…”
Section: Explaining Model Bias Using Model Similaritymentioning
confidence: 99%
“…Literature on network pruning has been historically focused on accuracy [14,5] with recently work on robustness [7,22]. Movva and Zhao [17] investigated the impact of pruning on layer similarities of NLP models using LinearCKA [12]. Ansuini et al and Blakeney et al also investigated how pruning can change representations using similarity based measures [2,3].…”
Section: Related Workmentioning
confidence: 99%