BERTTune: Fine-Tuning Neural Machine Translation with BERTScore

Unanue, Iñigo Jauregi; Parnell, Jacob; Piccardi, Massimo

doi:10.18653/v1/2021.acl-short.115

Cited by 7 publications

(2 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MASS allows the decoder to predict successive sequence fragments to improve the decoder's language modelling capabilities. On the other hand, finetune the strong NMT baseline with BertScore can amend overfitting and effectively overcome the typical limitations of n-gram matching [9], they use BertScore as the objective function for fine-tuning. Wei et al [10] leverage a pre-trained Transformer encoder based on contrastive learning to enrich the representations of bilingual.…”

Section: Related Workmentioning

confidence: 99%

SE‐Former: Incorporating sentence embeddings into Transformer for low‐resource NMT

Wang

2023

Electronics Letters

View full text Add to dashboard Cite

Recently, pre-trained language models (PLM), such as Roberta and SimCSE, demonstrate strengths in many natural language understanding (NLU) tasks. However, there are few works on applying PLM to neural machine translation (NMT). Motivated by alleviating the data scarcity issue of low-resource NMT, here incorporating sentence embeddings from SimCSE into the Transformer network is explored and SE-Former model is proposed. In this model an embed-fusion module is designed to utilize the output of SimCSE for NMT. Specifically, the outputs of encoder and SimCSE are fed into embed-fusion module, attention network learns the relationship of sentence embedding and corresponding word embeddings. After addition, concatenation and linear transformation operations, the tensor fused with sentence embedding is obtained. The size of tensor output by embed-fusion module is the same as original encoder. Finally, embed-fusion module is connected to Transformer decoder. On IWSLT En-Es, En-Zh and En-Fr tasks, SE-Former obtains 42.13, 29.32 and 39.21 bilingual evaluation understudy (BLEU) points, respectively. Experimental results show the superiority of this method.

show abstract

Section: Related Workmentioning

confidence: 99%

SE‐Former: Incorporating sentence embeddings into Transformer for low‐resource NMT

Wang

2023

Electronics Letters

View full text Add to dashboard Cite

show abstract

“…Various methods have been proposed to tackle this problem (Pan et al, 2023). From training-time correction Li et al, 2019;Jauregi Unanue et al, 2021;Zelikman et al, 2022;Huang et al, 2022) to post output generation refinement (Madaan et al, 2023;Shinn et al, 2023;Zhang et al, 2023;Pan et al, 2023;Yu et al, 2023;Gou et al, 2023;Paul et al, 2023;Akyurek et al, 2023), these methods have shown the impact that iterative self-refinement and proper feedback can have on the performance of LLMs.…”

Section: Introductionmentioning

confidence: 99%