Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023
DOI: 10.18653/v1/2023.emnlp-main.362
|View full text |Cite
|
Sign up to set email alerts
|

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding

Sangmin Bae,
Jongwoo Ko,
Hwanjun Song
et al.

Abstract: To tackle the high inference latency exhibited by autoregressive language models, previous studies have proposed an early-exiting framework that allocates adaptive computation paths for each token based on the complexity of generating the subsequent token. However, we observed several shortcomings, including performance degradation caused by a state copying mechanism or numerous exit paths, and sensitivity to exit confidence thresholds. Consequently, we propose a Fast and Robust Early-Exiting (FREE) framework,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
references
References 29 publications
0
0
0
Order By: Relevance