Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.37
|View full text |Cite
|
Sign up to set email alerts
|

Learning Source Phrase Representations for Neural Machine Translation

Abstract: The Transformer translation model (Vaswani et al., 2017) based on a multi-head attention mechanism can be computed effectively in parallel and has significantly pushed forward the performance of Neural Machine Translation (NMT). Though intuitively the attentional network can connect distant words via shorter network paths than RNNs, empirical analysis demonstrates that it still has difficulty in fully capturing long-distance dependencies (Tang et al., 2018). Considering that modeling phrases instead of words h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 22 publications
0
9
0
Order By: Relevance
“…Of all the comparing systems, MG-SH (Hao et al, 2019), Proto-TF (Yin et al, 2022) and UMST (Li et al, 2022a) uses external tools to obtain syntactic information, which is timeconsuming. Notably, our approach and LD (Xu et al, 2020) can work without any syntactic parsing tools and yield prominent improvement too. Apart from Multi-BLEU, we also report SacreBLEU in brackets.…”
Section: Resultsmentioning
confidence: 97%
See 1 more Smart Citation
“…Of all the comparing systems, MG-SH (Hao et al, 2019), Proto-TF (Yin et al, 2022) and UMST (Li et al, 2022a) uses external tools to obtain syntactic information, which is timeconsuming. Notably, our approach and LD (Xu et al, 2020) can work without any syntactic parsing tools and yield prominent improvement too. Apart from Multi-BLEU, we also report SacreBLEU in brackets.…”
Section: Resultsmentioning
confidence: 97%
“…Some works focus on finding phrase alignments between source and target sentences (Lample et al, 2018;Huang et al, 2018). Others focus on utilizing the source sentence phrase representations (Xu et al, 2020;Hao et al, 2019;Li et al, 2022aLi et al, , 2023. However, these approaches often rely on time-consuming parsing tools to extract phrases.…”
Section: Introductionmentioning
confidence: 99%
“…The experimental results of various existing state-of-the-art (SOTA) models on the same dataset, including Base Transformer and Big Transformer (Vaswani et al, 2017), Evolved Transformer (So et al, 2019), Dynamic Programming Encoding NMT (Li et al, 2020), Phrase Representations Transformer (Xu et al, 2020), are quoted as a reference. For a fair comparison, we list the single best result reported in their papers.…”
Section: Resultsmentioning
confidence: 99%
“…MODEL BLEU Transformer (base) (Vaswani et al, 2017) 27.3 Transformer (big) (Vaswani et al, 2017) 28.4 Evolved Transformer (So et al, 2019) 28.4 DPE-NMT (Li et al, 2020) 27.61 Transformer base + PR (Xu et al, 2020) 28.67 Fairseq (baseline) (Ott et al, 2019) 27.44 BLT-NMT (Wei et al, 2019) 27.93 LTR-NMT 28.18 Topic-enhanced NMT (ours) 29.01 The s <j denotes the hidden state of decoder, y j−1 is the output token at the j − 1 step, the t j−1 is the topic embedding for token y j−1 , and the c j is a context vector.…”
Section: Decoder Topic Embeddingmentioning
confidence: 99%
“…To analyze the effects of MHPLSTM on performance with increasing input length, we conducted a length analysis on the news test set of the WMT 14 En-De task. Following Bahdanau et al (2015); Tu et al (2016); Xu et al (2020b), we grouped sentences of similar lengths together and computed BLEU scores of the MHPLSTM and our baselines for each group. BLEU score results and decoding speed-up of each group are shown in Figure 4 and 5 respectively.…”
Section: Length Analysismentioning
confidence: 99%