2020
DOI: 10.48550/arxiv.2011.14203
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 65 publications
0
1
0
Order By: Relevance
“…Other great ways of improving the efficiency of transformers include weight sharing across transformer blocks (Lan et al, 2019), dynamically controlling the attention span of each token Tambe et al, 2020), and allowing the model to output the result in an earlier transformer block (Zhou et al, 2020;Schwartz et al, 2020). These techniques are orthogonal to our pruningbased method and have remained unexplored on vision models.…”
Section: Vit Compression Techniquesmentioning
confidence: 99%
“…Other great ways of improving the efficiency of transformers include weight sharing across transformer blocks (Lan et al, 2019), dynamically controlling the attention span of each token Tambe et al, 2020), and allowing the model to output the result in an earlier transformer block (Zhou et al, 2020;Schwartz et al, 2020). These techniques are orthogonal to our pruningbased method and have remained unexplored on vision models.…”
Section: Vit Compression Techniquesmentioning
confidence: 99%