Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.354
|View full text |Cite
|
Sign up to set email alerts
|

On Biasing Transformer Attention Towards Monotonicity

Abstract: Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 23 publications
0
5
0
Order By: Relevance
“…In this work, we have evaluated the most common and most popular model architectures, but it would be interesting to test specific model architectures for character transduction tasks, e.g. models that put some monotonicity constraint on the attention mechanism (Wu and Cotterell, 2019;Rios et al, 2021). We defer this to future work.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this work, we have evaluated the most common and most popular model architectures, but it would be interesting to test specific model architectures for character transduction tasks, e.g. models that put some monotonicity constraint on the attention mechanism (Wu and Cotterell, 2019;Rios et al, 2021). We defer this to future work.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, specific model architectures for character transduction tasks have been proposed, e.g. constraining the attention to be monotonic (Wu and Cotterell, 2019;Rios et al, 2021). We did not include such architectures in our experiments since they generally only showed marginal improvements.…”
Section: Limitationsmentioning
confidence: 99%
“…Another modification to the transformer architecture, which can improve performance on morphology tasks, is to add a so-called monotonicity loss (Rios et al, 2021). This can bias the transformer toward near-monotonic alignment between the input and output sequence, which is often the case in inflection.…”
Section: Related Workmentioning
confidence: 99%
“…Isolated morphological analysis and reinflection have long been of interest to the NLP community, with yearly shared tasks (Kurimo et al, 2010;Vylomova et al, 2020) that result in dedicated architectures that perform well for many languages (Aharoni and Goldberg, 2017;Makarov and Clematide, 2018;Wu and Cotterell, 2019;Rios et al, 2021). However, despite the large morphological diversity of natural languages, many approaches for sentence-level sequence-to-sequence tasks are often only tested on a subset of (morphologically similar) languages and then adopted without much questioning (Bender, 2011).…”
Section: Related Workmentioning
confidence: 99%