“…The pre-trained language model (PLM) is driven by a large amount of corpus and can use these data to realize the semantic representation of knowledge contained in a large amount of text to realize downstream tasks. The downstream tasks include natural language processing tasks such as classification (Li et al, 2019b ; Maltoudoglou et al, 2022 ; Ni et al, 2020a , 2020b ), sequence labeling (Dai et al, 2019 ; Li et al, 2020b ), summarization (Chintagunta et al, 2021 ; Lacson et al, 2006 ; Yuan et al, 2021 ), translation (Névéol et al, 2018 ; Nobel et al, 2021 ; Wang et al, 2019 ), generation (Melamud & Shivade, 2019 ; Peng et al, 2019 ; Xiong et al, 2019 ), etc. As one of the new downstream tasks, the translation task, Zhu et al ( 2020 ) previously found that using the pre-trained language model as contextual embedding instead of direct fine-tuning will produce better results.…”