2022
DOI: 10.48084/etasr.4772
|View full text |Cite
|
Sign up to set email alerts
|

D-CNN: A New model for Generating Image Captions with Text Extraction Using Deep Learning for Visually Challenged Individuals

Abstract: Automatically describing the information of an image using properly constructed sentences is a tricky task in any language. However, it has the potential to have a significant effect by enabling visually challenged individuals to better understand their surroundings. This paper proposes an image captioning system that generates detailed captions and extracts text from an image, if any, and uses it as a part of the caption to provide a more precise description of the image. To extract the image features, the pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
3

Relationship

1
9

Authors

Journals

citations
Cited by 21 publications
(6 citation statements)
references
References 23 publications
0
5
0
1
Order By: Relevance
“…The dataset contains some images which contain text in the form of a banner or poster as shown in Figure 12. Using this dataset, we proposed a new deep learning model [10] for performing image captioning and text extraction. The obtained results of image captioning, including textual information present in the image, were satisfactory, giving accuracy up to 83% and performed as well as the state of the art methods.…”
Section: A Resultsmentioning
confidence: 99%
“…The dataset contains some images which contain text in the form of a banner or poster as shown in Figure 12. Using this dataset, we proposed a new deep learning model [10] for performing image captioning and text extraction. The obtained results of image captioning, including textual information present in the image, were satisfactory, giving accuracy up to 83% and performed as well as the state of the art methods.…”
Section: A Resultsmentioning
confidence: 99%
“…Besides, the generated textual sentence can be transformed into speech. Bhalekar et al [12] devise an image captioning mechanism that produces comprehensive captions, derives text from imagery, if any, and utilizes it as a part of the caption to offer a highly accurate description of the imagery. The devised method will use LSTM and CNNs to extract the image features to produce respective sentences related to the learned image features.…”
Section: Related Workmentioning
confidence: 99%
“…They can also accurately localize a portion of an image or data cluster using tools such as regression. In our efforts to deploy a model of CNN compatible with image extraction, we took inspiration from literature such as [14][15][16]. The main methodology applied in recent literature for detection networks are:  Direct detection [14,15]  Regional detection [13,17]…”
Section: Convolutional Neural Networkmentioning
confidence: 99%