2020
DOI: 10.5715/jnlp.27.445
|View full text |Cite
|
Sign up to set email alerts
|

Syntax-based Transformer for Neural Machine Translation

Abstract: Polosukhin 2017), which purely depends on attention mechanism, has achieved stateof-the-art performance on machine translation (MT). However, syntactic information, which has improved many previous MT models, has not been utilized explicitly by Transformer. We propose a syntax-based Transformer for MT, which incorporates source-side syntax structures generated by the parser into the self-attention and positional encoding of the encoder. Our method is general in that it is applicable to both constituent trees a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…e study chose the International Spoken Language and its Translation Review Contest (IWSLT) 2019 data, which has a small data size, as the dataset for this experiment, including 220,000 Chinese-English parallel utterance pairs, pairs of test set data, and pairs of development data [26]. Since the LSTM attentional embedding-based English machine translation model could not be trained and learned directly on the IWSLT 2019 dataset, word vector transformation of the dataset was also required [27].…”
Section: Dataset Sources and Preprocessingmentioning
confidence: 99%
“…e study chose the International Spoken Language and its Translation Review Contest (IWSLT) 2019 data, which has a small data size, as the dataset for this experiment, including 220,000 Chinese-English parallel utterance pairs, pairs of test set data, and pairs of development data [26]. Since the LSTM attentional embedding-based English machine translation model could not be trained and learned directly on the IWSLT 2019 dataset, word vector transformation of the dataset was also required [27].…”
Section: Dataset Sources and Preprocessingmentioning
confidence: 99%
“…where y t , W w , and b w are attention mechanism layer parameters and a t represents the weight of the sequence input at the t-th time point in the whole input. erefore, through the attention mechanism layer, the input vector v t with a weight expression can be obtained; the calculation formula is shown in (6).…”
Section: A Topic Segmentation Model Based On Model Transfermentioning
confidence: 99%
“…Translation models can be divided into the following three categories according to different basic translation units and modelling methods: word-based translation, phrasebased translation, and syntax-based translation [6][7][8]. e word-based translation model takes the translated word pair as the basic translation unit.…”
Section: Introductionmentioning
confidence: 99%