Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.147
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Machine Translation with Dependency-Aware Self-Attention

Abstract: Most neural machine translation models only rely on pairs of parallel sentences, assuming syntactic information is automatically learned by an attention mechanism. In this work, we investigate different approaches to incorporate syntactic knowledge in the Transformer model and also propose a novel, parameter-free, dependency-aware self-attention mechanism that improves its translation quality, especially for long sentences and in low-resource scenarios. We show the efficacy of each approach on WMT English↔Germ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
47
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(47 citation statements)
references
References 27 publications
0
47
0
Order By: Relevance
“…Wu et al (2019) introduces dynamic convolutions to predict separate convolution kernels solely based on the current time-step in order to determine the importance of context elements. In order to adjust attention weights beyond SAN, Shaw et al (2018) extends the self-attention mechanism to efficiently consider representations of the relative positions or distances between sequence elements through adding a relative position embedding to the key vectors; Bugliarello and Okazaki (2019) transfers the distance between two nodes in dependency trees with a pre-defined Gaussian weighting function and multiply the distance with the key-query inner product value; Dai et al (2019) presents a relative position encoding scheme that adds additional relative position representation to the key-query computation. Sukhbaatar et al (2019a) proposes a parameterized linear function over self-attention to learn the optimal attention span in order to extend significantly the maximum context size used in Transformer.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Wu et al (2019) introduces dynamic convolutions to predict separate convolution kernels solely based on the current time-step in order to determine the importance of context elements. In order to adjust attention weights beyond SAN, Shaw et al (2018) extends the self-attention mechanism to efficiently consider representations of the relative positions or distances between sequence elements through adding a relative position embedding to the key vectors; Bugliarello and Okazaki (2019) transfers the distance between two nodes in dependency trees with a pre-defined Gaussian weighting function and multiply the distance with the key-query inner product value; Dai et al (2019) presents a relative position encoding scheme that adds additional relative position representation to the key-query computation. Sukhbaatar et al (2019a) proposes a parameterized linear function over self-attention to learn the optimal attention span in order to extend significantly the maximum context size used in Transformer.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, Transformer (Vaswani et al, 2017) has been widely applied in various natural language processing tasks, such as neural machine translation (Vaswani et al, 2017) and text summarization . To further improve the performance of the text representation, Transformer-based variants have attracted a lot of attention Sukhbaatar et al, 2019a,b;Bugliarello and Okazaki, 2019;Ma et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Some researchers proposed syntax-aware selfattentions that are trained using dependency-based constraints. For instance, Wang et al (2019a) and Bugliarello and Okazaki (2020) proposed sourceside dependency-aware Transformer NMT. Wang et al (2019a) created a constraint based on dependency relations between tokens to encoder self-attentions.…”
Section: Related Workmentioning
confidence: 99%
“…One of its characteristics is the self-attention mechanism, which computes the strength of relationships between two words in a sentence. Transformer NMT has been improved by extending the self-attention mechanism to incorporate syntactic information (Wang et al, 2019b;Omote et al, 2019;Deguchi et al, 2019;Wang et al, 2019a;Bugliarello and Okazaki, 2020). In particular, Deguchi et al (2019) and Wang et al (2019a) have proposed dependencybased self-attentions, which are trained to attend to the syntactic parent for each token under constraints based on the dependency relations, for capturing sentence structures.…”
Section: Introductionmentioning
confidence: 99%
“…In the work of Zhang et al (2019), the authors employed a supervised encoder-decoder dependency parser and used the outputs from the encoder as a syntax-aware representations of words, which in turn, are concatenated to the input embeddings of the translation model. Most recently, Bugliarello and Okazaki (2020) proposed dependency-aware self-attention in the Transformer that needs no extra parameter. For a pivot word, its self-attention scores with other words are weighted considering their distances from the dependency parent of the pivot word.…”
Section: Related Workmentioning
confidence: 99%