2022
DOI: 10.1007/s10994-021-06070-y
|View full text |Cite
|
Sign up to set email alerts
|

A study of BERT for context-aware neural machine translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 28 publications
0
10
0
Order By: Relevance
“…Among them, translators play an important role in translation. This is a research analysis of how to quickly identify a translator model with high accuracy and robustness [ 11 ]. The basic framework of machine translation is shown in Figure 2 .…”
Section: Methodsmentioning
confidence: 99%
“…Among them, translators play an important role in translation. This is a research analysis of how to quickly identify a translator model with high accuracy and robustness [ 11 ]. The basic framework of machine translation is shown in Figure 2 .…”
Section: Methodsmentioning
confidence: 99%
“…Alternatively, sentence-level translations can be refined via reinforcement learning (Xiong et al, 2019;Mansimov et al, 2021) or monolingual repair to post-edit contextual errors in the target language (Voita et al, 2019a). More recently, the use of pretrained language models has been explored for the task, using them to encode the context (Wu et al, 2022) or to initialize NMT models (Huang et al, 2023). Other studies directly use Large Language Models to perform translations, showing that competitive results can be obtained with this approach, although they might still make critical errors in domains such as literature and sometimes perform worse than conventional NMT models in contrastive tests (Wang et al, 2023;Karpinska and Iyyer, 2023;Hendy et al, 2023).…”
Section: Related Workmentioning
confidence: 99%
“…A high capacity teacher network predicts the soft label target for the labels for a small student network to learn from, hence distilling knowledge from the larger network into the smaller network (Hilton et al 2015) [114]. Other ways in which knowledge distillation has been implemented in NLP describe whether to be used for pre-training task DistillBERT, a distilled version of BERT (Sanh et al, 2019) [115], leveraged BERT to encode contextual information by aggregating features (Wu et al [116]). In this, it loops from teacher to student in language modeling with the task of predicting the mass token.…”
Section: Knowledge Distillationmentioning
confidence: 99%