2020
DOI: 10.48550/arxiv.2006.13561
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Differentiable Window for Dynamic Local Attention

Abstract: We propose Differentiable Window, a new neural module and general purpose component for dynamic window selection. While universally applicable, we demonstrate a compelling use case of utilizing Differentiable Window to improve standard attention modules by enabling more focused attentions over the input regions. We propose two variants of Differentiable Window, and integrate them within the Transformer architecture in two novel ways. We evaluate our proposed approach on a myriad of NLP tasks, including machine… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 11 publications
0
5
1
Order By: Relevance
“…Longformer [1] utilizes dilated sliding window attention to combine local and global information. And [25] enables more focused attentions by dynamic differentiable windows.…”
Section: Variable Attentionmentioning
confidence: 99%
“…Longformer [1] utilizes dilated sliding window attention to combine local and global information. And [25] enables more focused attentions by dynamic differentiable windows.…”
Section: Variable Attentionmentioning
confidence: 99%
“…for the local attention. However, in [6], they utilize a differential window instead of a Gaussian mask. For that reason, we transfer the idea of their fusion process to our approach:…”
Section: Bias Attention Fusionmentioning
confidence: 99%
“…Since the improvement mentioned in [6,20] is located in the domain of machine translation and language modeling, it is not clear if the same application holds for the local attention in ASR. Therefore, we perform a short ablation study where to apply localness and to identify the most effective way to fuse the global score from Equation 10 and local score from Equation 11.…”
Section: Ablation Studymentioning
confidence: 99%
See 2 more Smart Citations