2022
DOI: 10.1109/access.2022.3159346
|View full text |Cite
|
Sign up to set email alerts
|

Simple and Effective Multimodal Learning Based on Pre-Trained Transformer Models

Abstract: Transformer-based models have been garnering attention in various fields. Beginning with their success in natural language processing, transformer-based models have succeeded in several other fields, such as image and automatic speech recognition. In addition to them being trained on unimodal information, many transformer-based models have been proposed for multimodal information. In the field of multimodal learning, a common problem encountered is that of insufficient multimodal training data. This study atte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(1 citation statement)
references
References 31 publications
0
1
0
Order By: Relevance
“…Of the 5 papers in the visual domain, classification [376], detection [377], prediction [378], identification [379] and extraction [380] have 1. The entertainment domain has 6 relevant articles, where 2 were in segmentation [381], [382], 1 was in prediction [383], 2 were in detection [384], [385] and 1 was in recognition [386].…”
Section: Inclusion Criteriamentioning
confidence: 99%
“…Of the 5 papers in the visual domain, classification [376], detection [377], prediction [378], identification [379] and extraction [380] have 1. The entertainment domain has 6 relevant articles, where 2 were in segmentation [381], [382], 1 was in prediction [383], 2 were in detection [384], [385] and 1 was in recognition [386].…”
Section: Inclusion Criteriamentioning
confidence: 99%