2021 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream) 2021
DOI: 10.1109/estream53087.2021.9431465
|View full text |Cite
|
Sign up to set email alerts
|

Pretrained Word Embeddings for Image Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…In addition, the previous model (Bi-LSTM G,+M ) [4] Moreover, in recent interesting work [3,4], the authors found that small dataset like Flickr8K which has difficulty to train the deep models with insufficient data. Nonetheless, the proposed Bi-LSTM E,+W model substantially outperforms on all metrics for both word and syllable segmentation tasks although the size of the corpus is small (around 50460 sentences for 10K images), compare with other different models namely GRU E , Bi-GRU E , LSTM E , Bi-LSTM G , Bi-LSTM N , Bi-LSTM E , the baseline models [1,2] as well as the state-of-the-art models [3,4,5].…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition, the previous model (Bi-LSTM G,+M ) [4] Moreover, in recent interesting work [3,4], the authors found that small dataset like Flickr8K which has difficulty to train the deep models with insufficient data. Nonetheless, the proposed Bi-LSTM E,+W model substantially outperforms on all metrics for both word and syllable segmentation tasks although the size of the corpus is small (around 50460 sentences for 10K images), compare with other different models namely GRU E , Bi-GRU E , LSTM E , Bi-LSTM G , Bi-LSTM N , Bi-LSTM E , the baseline models [1,2] as well as the state-of-the-art models [3,4,5].…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…ResNet101 is used as feature extraction model and Standard LSTM with one cell is utilized as decoder. The pretrained vector representations as Word2Vec and GloVe embedding are compared on the MSCOCO dataset [5]. The model performance with GloVe vectors achieved better results than the model with Word2Vec because image captioning is more suitable with co-occurrence of word pairs in the entire corpus.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Other studies, such as Vinyals et al [18], have found that using pre-trained embedding does not improve model performance. Atliha et al [19] used pre-trained embedding, which is GloVe, with fine-tuning and found that it can improve the performance of the model and are most suitable for the image captioning model training improvement.…”
Section: Lstm Ms Cocomentioning
confidence: 99%