Enhancing Machine Translation with Dependency-Aware Self-Attention

Bugliarello, Emanuele; Okazaki, Naoaki

doi:10.18653/v1/2020.acl-main.147

Cited by 56 publications

(47 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Wu et al (2019) introduces dynamic convolutions to predict separate convolution kernels solely based on the current time-step in order to determine the importance of context elements. In order to adjust attention weights beyond SAN, Shaw et al (2018) extends the self-attention mechanism to efficiently consider representations of the relative positions or distances between sequence elements through adding a relative position embedding to the key vectors; Bugliarello and Okazaki (2019) transfers the distance between two nodes in dependency trees with a pre-defined Gaussian weighting function and multiply the distance with the key-query inner product value; Dai et al (2019) presents a relative position encoding scheme that adds additional relative position representation to the key-query computation. Sukhbaatar et al (2019a) proposes a parameterized linear function over self-attention to learn the optimal attention span in order to extend significantly the maximum context size used in Transformer.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, Transformer (Vaswani et al, 2017) has been widely applied in various natural language processing tasks, such as neural machine translation (Vaswani et al, 2017) and text summarization . To further improve the performance of the text representation, Transformer-based variants have attracted a lot of attention Sukhbaatar et al, 2019a,b;Bugliarello and Okazaki, 2019;Ma et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mask Attention Networks: Rethinking and Strengthen Transformer

Fan¹,

Gong²,

Liu³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and Feed-Forward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding of SAN and FFN as Mask Attention Networks (MANs) and show that they are two special cases of MANs with static mask matrices. However, their static mask matrices limit the capability for localness modeling in text representation learning. We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learnable mask matrix which is able to model localness adaptively. To incorporate advantages of DMAN, SAN, and FFN, we propose a sequential layered structure to combine the three types of layers. Extensive experiments on various tasks, including neural machine translation and text summarization demonstrate that our model outperforms the original Transformer.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Mask Attention Networks: Rethinking and Strengthen Transformer

Fan¹,

Gong²,

Liu³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…Some researchers proposed syntax-aware selfattentions that are trained using dependency-based constraints. For instance, Wang et al (2019a) and Bugliarello and Okazaki (2020) proposed sourceside dependency-aware Transformer NMT. Wang et al (2019a) created a constraint based on dependency relations between tokens to encoder self-attentions.…”

Section: Related Workmentioning

confidence: 99%

“…One of its characteristics is the self-attention mechanism, which computes the strength of relationships between two words in a sentence. Transformer NMT has been improved by extending the self-attention mechanism to incorporate syntactic information (Wang et al, 2019b;Omote et al, 2019;Deguchi et al, 2019;Wang et al, 2019a;Bugliarello and Okazaki, 2020). In particular, Deguchi et al (2019) and Wang et al (2019a) have proposed dependencybased self-attentions, which are trained to attend to the syntactic parent for each token under constraints based on the dependency relations, for capturing sentence structures.…”

Section: Introductionmentioning

confidence: 99%

Synchronous Syntactic Attention for Transformer Neural Machine Translation

Deguchi¹,

Tamura²,

Ninomiya³

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

This paper proposes a novel attention mechanism for Transformer Neural Machine Translation, "Synchronous Syntactic Attention," inspired by synchronous dependency grammars. The mechanism synchronizes source-side and target-side syntactic self-attentions by minimizing the difference between target-side selfattentions and the source-side self-attentions mapped by the encoder-decoder attention matrix. The experiments show that the proposed method improves the translation performance on WMT14 En-De, WMT16 En-Ro, and AS-PEC Ja-En (up to +0.38 points in BLEU).

show abstract

“…In the work of Zhang et al (2019), the authors employed a supervised encoder-decoder dependency parser and used the outputs from the encoder as a syntax-aware representations of words, which in turn, are concatenated to the input embeddings of the translation model. Most recently, Bugliarello and Okazaki (2020) proposed dependency-aware self-attention in the Transformer that needs no extra parameter. For a pivot word, its self-attention scores with other words are weighted considering their distances from the dependency parent of the pivot word.…”

Section: Related Workmentioning

confidence: 99%

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

Chakrabarty¹,

Dabre²,

Ding³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data. Integrating manually designed or automatically extracted features into the NMT framework is known to be beneficial. However, this study emphasizes that the relevance of the features is crucial to the performance. Specifically, we propose two methods, 1) self relevance and 2) word-based relevance, to improve the representation of features for NMT. Experiments are conducted on translation tasks from English to eight Asian languages, with no more than twenty thousand sentences for training. The proposed methods improve translation quality for all tasks by up to 3.09 BLEU points. Discussions with visualization provide the explainability of the proposed methods where we show that the relevance methods provide weights to features thereby enhancing their impact on low-resource machine translation.

show abstract

Enhancing Machine Translation with Dependency-Aware Self-Attention

Cited by 56 publications

References 27 publications

Mask Attention Networks: Rethinking and Strengthen Transformer

Mask Attention Networks: Rethinking and Strengthen Transformer

Synchronous Syntactic Attention for Transformer Neural Machine Translation

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

Contact Info

Product

Resources

About