Learning Source Phrase Representations for Neural Machine Translation

Xu, Hongfei; Genabith, Josef van; Xiong, Deyi; Liu, Qiuhui; Zhang, Jingyi

doi:10.18653/v1/2020.acl-main.37

Cited by 13 publications

(9 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Of all the comparing systems, MG-SH (Hao et al, 2019), Proto-TF (Yin et al, 2022) and UMST (Li et al, 2022a) uses external tools to obtain syntactic information, which is timeconsuming. Notably, our approach and LD (Xu et al, 2020) can work without any syntactic parsing tools and yield prominent improvement too. Apart from Multi-BLEU, we also report SacreBLEU in brackets.…”

Section: Resultsmentioning

confidence: 97%

“…Some works focus on finding phrase alignments between source and target sentences (Lample et al, 2018;Huang et al, 2018). Others focus on utilizing the source sentence phrase representations (Xu et al, 2020;Hao et al, 2019;Li et al, 2022aLi et al, , 2023. However, these approaches often rely on time-consuming parsing tools to extract phrases.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Enhancing Neural Machine Translation with Semantic Units

Huang,

Gu,

Zhuocheng

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Conventional neural machine translation (NMT) models typically use subwords and words as the basic units for model input and comprehension. However, complete words and phrases composed of several tokens are often the fundamental units for expressing semantics, referred to as semantic units. To address this issue, we propose a method Semantic Units for Machine Translation (SU4MT) which models the integral meanings of semantic units within a sentence, and then leverages them to provide a new perspective for understanding the sentence. Specifically, we first propose Word Pair Encoding (WPE), a phrase extraction method to help identify the boundaries of semantic units. Next, we design an Attentive Semantic Fusion (ASF) layer to integrate the semantics of multiple subwords into a single vector: the semantic unit representation. Lastly, the semantic-unit-level sentence representation is concatenated to the token-level one, and they are combined as the input of encoder. Experimental results demonstrate that our method effectively models and leverages semantic-unit-level information and outperforms the strong baselines. The code is available at https://github.com/ictnlp/SU4MT.

show abstract

Section: Resultsmentioning

confidence: 97%

Section: Introductionmentioning

confidence: 99%

Enhancing Neural Machine Translation with Semantic Units

Huang,

Gu,

Zhuocheng

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…The experimental results of various existing state-of-the-art (SOTA) models on the same dataset, including Base Transformer and Big Transformer (Vaswani et al, 2017), Evolved Transformer (So et al, 2019), Dynamic Programming Encoding NMT (Li et al, 2020), Phrase Representations Transformer (Xu et al, 2020), are quoted as a reference. For a fair comparison, we list the single best result reported in their papers.…”

Section: Resultsmentioning

confidence: 99%

“…MODEL BLEU Transformer (base) (Vaswani et al, 2017) 27.3 Transformer (big) (Vaswani et al, 2017) 28.4 Evolved Transformer (So et al, 2019) 28.4 DPE-NMT (Li et al, 2020) 27.61 Transformer base + PR (Xu et al, 2020) 28.67 Fairseq (baseline) (Ott et al, 2019) 27.44 BLT-NMT (Wei et al, 2019) 27.93 LTR-NMT 28.18 Topic-enhanced NMT (ours) 29.01 The s <j denotes the hidden state of decoder, y j−1 is the output token at the j − 1 step, the t j−1 is the topic embedding for token y j−1 , and the c j is a context vector.…”

Section: Decoder Topic Embeddingmentioning

confidence: 99%

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings

Wang¹,

Peng²,

Zhang³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Neural Machine Translation (NMT) has shown a strong ability to utilize local context to disambiguate the meaning of words. However, it remains a challenge for NMT to leverage broader context information like topics. In this paper, we propose heterogeneous ways of embedding topic information at the sentence level into an NMT model to improve translation performance. Specifically, the topic information can be incorporated as pre-encoder topic embedding, post-encoder topic embedding, and decoder topic embedding to increase the likelihood of selecting target words from the same topic of the source sentence. Experimental results show that NMT models with the proposed topic knowledge embedding outperform the baselines on the English → German and English → French translation tasks. 1

show abstract

“…To analyze the effects of MHPLSTM on performance with increasing input length, we conducted a length analysis on the news test set of the WMT 14 En-De task. Following Bahdanau et al (2015); Tu et al (2016); Xu et al (2020b), we grouped sentences of similar lengths together and computed BLEU scores of the MHPLSTM and our baselines for each group. BLEU score results and decoding speed-up of each group are shown in Figure 4 and 5 respectively.…”

Section: Length Analysismentioning

confidence: 99%

Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation

Liu

Genabith

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

View full text Add to dashboard Cite

One of the reasons Transformer translation models are popular is that self-attention networks for context modelling can be easily parallelized at sequence level. However, the computational complexity of a self-attention network is O(n 2 ), increasing quadratically with sequence length. By contrast, the complexity of LSTM-based approaches is only O(n). In practice, however, LSTMs are much slower to train than self-attention networks as they cannot be parallelized at sequence level: to model context, the current LSTM state relies on the full LSTM computation of the preceding state. This has to be computed n times for a sequence of length n. The linear transformations involved in the LSTM gate and state computations are the major cost factors in this. To enable sequence-level parallelization of LSTMs, we approximate full LSTM context modelling by computing hidden states and gates with the current input and a simple bag-of-words representation of the preceding tokens context. This allows us to compute each input step efficiently in parallel, avoiding the formerly costly sequential linear transformations. We then connect the outputs of each parallel step with computationally cheap element-wise computations. We call this the Highly Parallelized LSTM. To further constrain the number of LSTM parameters, we compute several small HPLSTMs in parallel like multi-head attention in the Transformer. The experiments show that our MHPLSTM decoder achieves significant BLEU improvements, while being even slightly faster than the self-attention network in training, and much faster than the standard LSTM.

show abstract

Learning Source Phrase Representations for Neural Machine Translation

Cited by 13 publications

References 22 publications

Enhancing Neural Machine Translation with Semantic Units

Enhancing Neural Machine Translation with Semantic Units

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings

Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation

Contact Info

Product

Resources

About