2022
DOI: 10.1155/2022/4001460
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Image Caption Generation Based on Some Machine Learning Algorithms

Abstract: This paper is dedicated to machine learning, the branches of machine learning, which include the methods for solving this issue, and the practical implementation of the solution to the automatic image description generation. Automatic image caption generation is one of the frequent goals of computer vision. Image description generation models must solve a larger number of complex problems to have this task successfully solved. The objects in the image must be detected and recognized, after which a logical and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…In the recent years, machine learning has been used for multiple tasks of text analysis to allow complex analysis of text. These include text generation such as automatic document classification [Kadhim, 2019;Kowsari et al, 2019], text generation [Gatt and Krahmer, 2018;de Rosa and Papa, 2021], text summarization [Gambhir and Gupta, 2017;El-Kassas et al, 2021], sentiment analysis [Zhang et al, 2018;Yadav and Vishwakarma, 2020], automatic caption generation [Bai and An, 2018;Hossain et al, 2019;Predić et al, 2022].…”
Section: Discussionmentioning
confidence: 99%
“…In the recent years, machine learning has been used for multiple tasks of text analysis to allow complex analysis of text. These include text generation such as automatic document classification [Kadhim, 2019;Kowsari et al, 2019], text generation [Gatt and Krahmer, 2018;de Rosa and Papa, 2021], text summarization [Gambhir and Gupta, 2017;El-Kassas et al, 2021], sentiment analysis [Zhang et al, 2018;Yadav and Vishwakarma, 2020], automatic caption generation [Bai and An, 2018;Hossain et al, 2019;Predić et al, 2022].…”
Section: Discussionmentioning
confidence: 99%
“…Next, in the second stage, the Querying Transformer is pre-trained for vision-to-language generative learning, utilizing a frozen Large Language Model (LLM). The authors in [20,21] have presented their work based on CNN and LSTM with integration with ML algorithms. The authors in [22] presented their work of generating the captions for the text summarization technique using an ML-based pre-trained algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…To achieve pre-training of a unified vision-language model that combines comprehension and generation abilities, the Bootstrapping Language-Image Pre-training (BLIP) model introduces a multimodal encoder-decoder architecture. This architecture serves three key functions [19,20]:…”
Section: Bootstrapping Process For Language-image Pretraining Modelmentioning
confidence: 99%