2021
DOI: 10.48550/arxiv.2103.17151
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder Translation Models

Abstract: Multi-encoder models are a broad family of context-aware Neural Machine Translation (NMT) systems that aim to improve translation quality by encoding document-level contextual information alongside the current sentence. The context encoding is undertaken by contextual parameters, trained on documentlevel data. In this work, we show that training these parameters takes large amount of data, since the contextual training signal is sparse. We propose an efficient alternative, based on splitting sentence pairs, th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 32 publications
0
4
0
Order By: Relevance
“…Since a few tokens within a sentence reveal the formality, context-aware NMT models have difficulty to training and evaluating consistent stylistic translation. First, due to this sparsity (Lupo et al, 2021) that also occurs within the context, a training alternative is needed to amplify the weak training signal. In addition, BLEU (Papineni et al, 2002), a standard translation quality metric, often fails to capture improvements, which bring subtle differences.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Since a few tokens within a sentence reveal the formality, context-aware NMT models have difficulty to training and evaluating consistent stylistic translation. First, due to this sparsity (Lupo et al, 2021) that also occurs within the context, a training alternative is needed to amplify the weak training signal. In addition, BLEU (Papineni et al, 2002), a standard translation quality metric, often fails to capture improvements, which bring subtle differences.…”
Section: Related Workmentioning
confidence: 99%
“…Since recent context-aware NMT models (Maruf et al, 2019;Bao et al, 2021) incorporate the context into the NMT task, these models can be utilized to infer the appropriate formality from the context and, accordingly, generate consistent translations. However, due to the sparsity of context (Lupo et al, 2021), it is challenging to capture a few formality-related words scattered within the context. Yin et al (2021) guides the model to concentrate on the relevant contexts by collecting human-annotated contextual dataset, but this method resorts to manual annotations, which is time-consuming and labor-intensive.…”
Section: Introductionmentioning
confidence: 99%
“…Miculicich et al (2018) pro-posed a hierarchical attention mechanism to capture the discourse information while Maruf et al (2019) employed a selective attention module to select the most relevant information in the context. Lupo et al (2021) further improved these methods by splitting the sentence into smaller segments to overcome the training sparsity problem. Recently, Tiedemann and Scherrer (2017); Ma et al (2020) suggest that Transformer has the ability to translate multiple sentences directly, and this documentby-document paradigm further reduce the context related errors.…”
Section: Document-level Neural Machine Translationmentioning
confidence: 99%
“…Document-level neural machine translation can be broadly divide into two categories, including sentence-to-sentence (sen2sen) approach and document-to-document (doc2doc) approach (Maruf et al, 2021). The former feed the context as additional information to assist the translation of each single sentence in the document independently, which is also known as multi encoder method (Lupo et al, 2022) However, the scarcity of the datasets and the sparsity of the contextual information make these model hard to be trained. Lupo et al (2022) further address this problem by splitting the sentence into smaller pieces to augment the document-level corpus.…”
Section: Document-level Neural Machine Translationmentioning
confidence: 99%