Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.686
|View full text |Cite
|
Sign up to set email alerts
|

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Abstract: Transformers are ubiquitous in Natural Language Processing (NLP) tasks, but they are difficult to be deployed on hardware due to the intensive computation. To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search. We first construct a large design space with arbitrary encoder-decoder attention and heterogeneous layers. Then we train a Super-Transformer that covers all candidates in the design space, and ef… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
113
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 164 publications
(114 citation statements)
references
References 29 publications
1
113
0
Order By: Relevance
“…We use Lat(•) to predict the latency of the candidates to filter out the candidates that do not meet the latency constraint. Lat(•) is built with the method by Wang et al (2020a), which first samples about 10k architectures from A and collects their inference time on target devices, and then uses a feed-forward network to fit the data. For more details of evolutionary algorithm, please refer to Appendix C. Note that we can use different methods in search process, such as random search and more advanced search, which is left as future work.…”
Section: Search Processmentioning
confidence: 99%
See 3 more Smart Citations
“…We use Lat(•) to predict the latency of the candidates to filter out the candidates that do not meet the latency constraint. Lat(•) is built with the method by Wang et al (2020a), which first samples about 10k architectures from A and collects their inference time on target devices, and then uses a feed-forward network to fit the data. For more details of evolutionary algorithm, please refer to Appendix C. Note that we can use different methods in search process, such as random search and more advanced search, which is left as future work.…”
Section: Search Processmentioning
confidence: 99%
“…More- 6 The first 16 models https://github.com/ google-research/bert from 2L128D to 8L768D. 4-192-768-12-192 4-256-480-12-192 17.0/17.0× over, we introduce HAT (Wang et al, 2020a), as a baseline of one-shot learning. HAT focuses on the search space of non-identical layer structures.…”
Section: Ablation Study Of One-shot Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…Traditional NAS methods Zhu et al, 2020) use downstream task performance as the objective to search for task-specific models. Instead, similar to the work by Khetan and Karnin (2020) et al, 2015).…”
Section: Evolutionary Searchmentioning
confidence: 99%