Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation

Oh, Jiun; Choi, Yong-Suk

doi:10.3390/app11188737

Cited by 4 publications

(6 citation statements)

References 27 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So we need a layer that smoothly maps the different feature spaces. Previous studies also showed that the model with FML outperformed the model without FML [11,12]. From the experimental results in Section 4.3.2, the Overcomplete FML showed the best performance.…”

Section: Feature Mapping Layer (Fml)mentioning

confidence: 77%

“…As shown in Figure 2, the Transformer has encoder and decoder structures. Therefore, CI can be applied by connecting the encoder trained with Data A and the decoder trained with Data B. Oh et al [12] conducted a study on translating languages by combining multiple Transformers trained on monolingual data. We extended the CI method to dual domains.…”

Section: Compositional Intelligence Methods (Ci)mentioning

confidence: 99%

“…To see which performed best, we experimented with a model without FML and four FML models. The model without FML is a model to check whether FML is effective in our study as in the experiments of previous studies [11,12]. The without FML model has the same structure as the From Scratch model in Section 4.2, but the caption generator performed fine-tuning by loading the weights of the pre-trained model.…”

Section: Steps Inferencesmentioning

confidence: 99%

“…In the study by Yoo et al [11], the image style transfer task was performed through a combination of encoder and decoder, trained with different style images. Oh et al [12] applied CI to machine translation. First, the Transformer is pre-trained with monolingual task, and then the machine translation task is trained by connecting an encoder and decoder trained in different languages.…”

Section: Introductionmentioning

confidence: 99%

“…First, the Transformer is pre-trained with monolingual task, and then the machine translation task is trained by connecting an encoder and decoder trained in different languages. Yoo et al [11] applied the CI to the image domain and Oh et al [12] applied the CI to the text domain. Since the same domain data is used, there is no need to consider the problem of domain differences when using CI.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Effective Pre-Training Method and Its Compositional Intelligence for Image Captioning

Choi

2022

Sensors

Self Cite

View full text Add to dashboard Cite

With the increase in the performance of deep learning models, the model parameter has increased exponentially. An increase in model parameters leads to an increase in computation and training time, i.e., an increase in training cost. To reduce the training cost, we propose Compositional Intelligence (CI). This is a reuse method that combines pre-trained models for different tasks. Since the CI uses a well-trained model, good performance and small training cost can be expected in the target task. We applied the CI to the Image Captioning task. Compared to using a trained feature extractor, the caption generator is usually trained from scratch. On the other hand, we pre-trained the Transformer model as a caption generator and applied CI, i.e., we used a pre-trained feature extractor and a pre-trained caption generator. To compare the training cost of the From Scratch model and the CI model, early stopping was applied during fine-tuning of the image captioning task. On the MS-COCO dataset, the vanilla image captioning model reduced training cost by 13.8% and improved performance by up to 3.2%, and the Object Relation Transformer model reduced training cost by 21.3%.

show abstract

Section: Feature Mapping Layer (Fml)mentioning

confidence: 77%

Section: Compositional Intelligence Methods (Ci)mentioning

confidence: 99%

Section: Steps Inferencesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations