2022
DOI: 10.3390/electronics11050764
|View full text |Cite
|
Sign up to set email alerts
|

Realistic Image Generation from Text by Using BERT-Based Embedding

Abstract: Recently, in the field of artificial intelligence, multimodal learning has received a lot of attention due to expectations for the enhancement of AI performance and potential applications. Text-to-image generation, which is one of the multimodal tasks, is a challenging topic in computer vision and natural language processing. The text-to-image generation model based on generative adversarial network (GAN) utilizes a text encoder pre-trained with image-text pairs. However, text encoders pre-trained with image-t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 21 publications
0
2
0
Order By: Relevance
“…Since then, CNN has largely adopted the LeNet architecture, which consists of three main components: a convolution layer, a pooling layer, and a FC layer. Feature graphs are obtained through convolution and pooling operations, and they are then transformed into one-dimensional vectors and input into the FC layer, where two or more classifications are achieved through the classification layer [13]. This section provides a brief explanation of the principle and purpose of each CNN component, using the structure of LeNet as an example [14].…”
Section: A Convolutional Neural Networkmentioning
confidence: 99%
“…Since then, CNN has largely adopted the LeNet architecture, which consists of three main components: a convolution layer, a pooling layer, and a FC layer. Feature graphs are obtained through convolution and pooling operations, and they are then transformed into one-dimensional vectors and input into the FC layer, where two or more classifications are achieved through the classification layer [13]. This section provides a brief explanation of the principle and purpose of each CNN component, using the structure of LeNet as an example [14].…”
Section: A Convolutional Neural Networkmentioning
confidence: 99%
“…This inspires us to improve BERT’s NMT use. The BERT-fused model first uses BERT to uproot depictions for an input sequence, then uses attention mechanisms to fuse the representations with every layer of encoder and decoder of the NMT model 9 , 10 .…”
Section: Introductionmentioning
confidence: 99%