Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.8
|View full text |Cite
|
Sign up to set email alerts
|

BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression

Abstract: The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency. This paper aims to address two weaknesses of previous work: (1) existing fine-tuning strategies for early exiting models fail to take full advantage of BERT; (2) methods to make exiting decisions are limited to classification tasks. We propose a more advanced fine-tuning strategy and a learning-toexit module that extends early ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
61
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 57 publications
(61 citation statements)
references
References 20 publications
0
61
0
Order By: Relevance
“…Our results have direct implications on the use of BERT as a knowledge base. By effectively choosing layers to query and adopting early exiting strategies (Xin et al 2020(Xin et al , 2021 knowledge base completion can be improved. The performance of RANK-MSMARCO also warrants further investigation into ranking models with different training objectives -pointwise (regression) vs pairwise vs listwise.…”
Section: Discussionmentioning
confidence: 99%
“…Our results have direct implications on the use of BERT as a knowledge base. By effectively choosing layers to query and adopting early exiting strategies (Xin et al 2020(Xin et al , 2021 knowledge base completion can be improved. The performance of RANK-MSMARCO also warrants further investigation into ranking models with different training objectives -pointwise (regression) vs pairwise vs listwise.…”
Section: Discussionmentioning
confidence: 99%
“…Before-prediction (Elbayad et al, 2020) (Xin et al, 2021) Take features as input and generate the label deciding whether to execute the forward process.…”
Section: Mlp(ŷ)mentioning
confidence: 99%
“…Mixture-of-experts-style dynamic networks are representative dynamic models (Lepikhin et al, 2021;Lin et al, 2021;Fedus et al, 2021). In those models, a layer contains multiple experts and only part of these experts will be activated for each instance.…”
Section: Mlp(ŷ)mentioning
confidence: 99%
See 1 more Smart Citation
“…Alternative methods are concerned with strategies for pruning the ensemble during or after the training phase [8,9,11], and budget-aware learning-to-rank algorithms [1,13]. Furthermore, researchers investigated early termination heuristics aimed to reduce, on a document-or query-level basis, the cost of the scoring process [3,12,15]. These works studied the impact of the proposed early termination strategies on both the latency and ranking accuracy.…”
Section: Introductionmentioning
confidence: 99%