2020
DOI: 10.3390/app10175978
|View full text |Cite
|
Sign up to set email alerts
|

Text Augmentation Using BERT for Image Captioning

Abstract: Image captioning is an important task for improving human-computer interaction as well as for a deeper understanding of the mechanisms underlying the image description by human. In recent years, this research field has rapidly developed and a number of impressive results have been achieved. The typical models are based on a neural networks, including convolutional ones for encoding images and recurrent ones for decoding them into text. More than that, attention mechanism and transformers are actively used for … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(15 citation statements)
references
References 37 publications
1
14
0
Order By: Relevance
“…Additionally, paper [21] employed object spatial relationship modeling for image captioning, specifically within the transformer encoder-decoder architecture by incorporating the object relation module within the transformer encoder. Paper [22] proposed the use of augmentation of image captions in a dataset including augmentation using BERT to improve a solution to the image captioning problem. Furthermore, paper [23] utilized two streams of transformer-based architecture: one for the visual part and another for the textual part.…”
Section: Image Captioning Using Transformermentioning
confidence: 99%
“…Additionally, paper [21] employed object spatial relationship modeling for image captioning, specifically within the transformer encoder-decoder architecture by incorporating the object relation module within the transformer encoder. Paper [22] proposed the use of augmentation of image captions in a dataset including augmentation using BERT to improve a solution to the image captioning problem. Furthermore, paper [23] utilized two streams of transformer-based architecture: one for the visual part and another for the textual part.…”
Section: Image Captioning Using Transformermentioning
confidence: 99%
“…Studies showed that using word embedding models, such as word2vec [34], can be competitive for synonym replacement tasks by leveraging the previously mentioned linguistic resources [35]. On the other hand, next to synonym replacement and word embeddings, recent studies use context-based language models to perform textual data augmentation [32], [36], [37]. One example of such a model is BERT (Bidirectional Encoder Representations from Transformers) [38].…”
Section: Automated Detection Of Reminiscence From Everyday Conversationsmentioning
confidence: 99%
“…BERT makes full use of a large number of unsupervised texts for self-supervised learning and encodes linguistic knowledge. Experiments demonstrate its superior performance on various NLP tasks 1 . Word2vec is a technique used to calculate word vectors 2 .…”
Section: Introductionmentioning
confidence: 99%