2021
DOI: 10.48550/arxiv.2112.08560
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Block-Skim: Efficient Question Answering for Transformer

Abstract: Transformer models have achieved promising results on natural language processing (NLP) tasks including extractive question answering (QA). Common Transformer encoders used in NLP tasks process the hidden states of all input tokens in the context paragraph throughout all layers. However, different from other tasks such as sequence classification, answering the raised question does not necessarily need all the tokens in the context paragraph. Following this motivation, we propose Block-Skim, which learns to ski… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

3
0

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…The context branch is preprocessed off-line and pruned at shallow layers. Also dedicated for QA tasks, Block-Skim (Guan et al, 2021) proposes to predict and skim the irrelevant context blocks by analyzing the attention weight patterns. Progressive Growth (Gu et al, 2021) randomly drops a portion of input tokens during training to achieve better pre-training efficiency.…”
Section: Related Workmentioning
confidence: 99%
“…The context branch is preprocessed off-line and pruned at shallow layers. Also dedicated for QA tasks, Block-Skim (Guan et al, 2021) proposes to predict and skim the irrelevant context blocks by analyzing the attention weight patterns. Progressive Growth (Gu et al, 2021) randomly drops a portion of input tokens during training to achieve better pre-training efficiency.…”
Section: Related Workmentioning
confidence: 99%
“…However, pruning will cause sparse irregular memory accesses. Therefore, pruning needs software (Gale et al, 2020;Guan et al, 2020;Qiu et al, 2019;Guo et al, 2020a;Guan et al, 2021;Fedus et al, 2021) and hardware (Gondimalla et al, 2019;Guo et al, 2020b;Zhang et al, 2020; optimization to accelerate.…”
Section: Related Workmentioning
confidence: 99%
“…The principles of attaching exits vary. The mechanisms in [25,37,49,54] directly place exits after each block of the transformer model, assuming the overhead of exits is small. In [12,22,29,30,43], the placement of exits is hand-crafted and depends on the model architecture.…”
Section: Multi-exit Dnn Modelsmentioning
confidence: 99%