Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1290
|View full text |Cite
|
Sign up to set email alerts
|

Look Harder: A Neural Machine Translation Model with Hard Attention

Abstract: Soft-attention based Neural Machine Translation (NMT) models have achieved promising results on several translation tasks. These models attend all the words in the source sequence for each target token, which makes them ineffective for long sequence translation. In this work, we propose a hard-attention based NMT model which selects a subset of source tokens for each target token to effectively handle long sequence translation. Due to the discrete nature of the hard-attention mechanism, we design a reinforceme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…Gating techniques relying on sampling and straight-through gradient estimators are common (Bengio et al, 2013;Eigen et al, 2013;. Conditional computation can also be addressed with reinforcement learning (Denoyer and Gallinari, 2014;Indurthi et al, 2019). Memory augmented neural networks with sparse reads and writes have also been proposed in Rae et al (2016) as a way to scale Neural Turing Machines (Graves et al, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…Gating techniques relying on sampling and straight-through gradient estimators are common (Bengio et al, 2013;Eigen et al, 2013;. Conditional computation can also be addressed with reinforcement learning (Denoyer and Gallinari, 2014;Indurthi et al, 2019). Memory augmented neural networks with sparse reads and writes have also been proposed in Rae et al (2016) as a way to scale Neural Turing Machines (Graves et al, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…Since the recurrent neural network operation mechanism is a left-right sequential operation process, this will significantly limit the parallel operation capability of the model itself and the sequential operation of data will also cause the problem of data module loss. e above problem will be improved by using the attention mechanism, which can change the data distance to 1 at any position in the translated data, so that it does not depend on the effect of the previous sequential operation on the current operation and the system will have better parallelism [32].…”
Section: Recurrent Neural Network With Attention Mechanismmentioning
confidence: 99%
“…Similarly, raise the question whether 16 attention heads are really necessary to obtain competitive performance. Finally, several recent works address the computational challenge of modeling very long sequences and modify the Transformer architecture with attention operations that reduce time complexity (Shen et al, 2018;Sukhbaatar et al, 2019;Dai et al, 2019;Indurthi et al, 2019;Kitaev et al, 2020).…”
Section: Introductionmentioning
confidence: 99%