2021
DOI: 10.3390/app11188737
|View full text |Cite
|
Sign up to set email alerts
|

Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation

Abstract: This work uses sequence-to-sequence (seq2seq) models pre-trained on monolingual corpora for machine translation. We pre-train two seq2seq models with monolingual corpora for the source and target languages, then combine the encoder of the source language model and the decoder of the target language model, i.e., the cross-connection. We add an intermediate layer between the pre-trained encoder and the decoder to help the mapping of each other since the modules are pre-trained completely independently. These mon… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 27 publications
(38 reference statements)
0
6
0
Order By: Relevance
“…So we need a layer that smoothly maps the different feature spaces. Previous studies also showed that the model with FML outperformed the model without FML [11,12]. From the experimental results in Section 4.3.2, the Overcomplete FML showed the best performance.…”
Section: Feature Mapping Layer (Fml)mentioning
confidence: 77%
See 4 more Smart Citations
“…So we need a layer that smoothly maps the different feature spaces. Previous studies also showed that the model with FML outperformed the model without FML [11,12]. From the experimental results in Section 4.3.2, the Overcomplete FML showed the best performance.…”
Section: Feature Mapping Layer (Fml)mentioning
confidence: 77%
“…As shown in Figure 2, the Transformer has encoder and decoder structures. Therefore, CI can be applied by connecting the encoder trained with Data A and the decoder trained with Data B. Oh et al [12] conducted a study on translating languages by combining multiple Transformers trained on monolingual data. We extended the CI method to dual domains.…”
Section: Compositional Intelligence Methods (Ci)mentioning
confidence: 99%
See 3 more Smart Citations