2022
DOI: 10.48550/arxiv.2201.08539
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models

Abstract: Recently, large pre-trained models have significantly improved the performance of various Natural Language Processing (NLP) tasks but they are expensive to serve due to long serving latency and large memory usage. To compress these models, knowledge distillation has attracted an increasing amount of interest as one of the most effective methods for model compression. However, existing distillation methods have not yet addressed the unique challenges of model serving in datacenters, such as handling fast evolvi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 27 publications
(67 reference statements)
0
2
0
Order By: Relevance
“…AutoDistill [222] generates a task-agnostic BERT model for the target hardware platform by considering several objectives, constraints, design space, evaluation metrics, and hardware performance. The framework operates in a loop fashion, in a series of three steps: Model Exploration, Flash Distillation, and Evaluation.…”
Section: K Autodistillmentioning
confidence: 99%
“…AutoDistill [222] generates a task-agnostic BERT model for the target hardware platform by considering several objectives, constraints, design space, evaluation metrics, and hardware performance. The framework operates in a loop fashion, in a series of three steps: Model Exploration, Flash Distillation, and Evaluation.…”
Section: K Autodistillmentioning
confidence: 99%
“…Inspired by the co-design strategies, researchers started to consider hardware efficiency for DNN design through NAS [40]- [45]. Different from the conventional NAS, which only focuses on delivering highly accurate models, hardwareaware NAS incorporates hardware metrics into the objective [40]- [43], [45]. Researchers also designed differentiable search algorithms to speed up NAS.…”
Section: Related Workmentioning
confidence: 99%