2019
DOI: 10.11591/ijece.v9i4.pp2932-2940
|View full text |Cite
|
Sign up to set email alerts
|

Natural language description of images using hybrid recurrent neural network

Abstract: We presented a learning model that generated natural language description of images. The model utilized the connections between natural language and visual data by produced text line based contents from a given image. Our Hybrid Recurrent Neural Network model is based on the intricacies of Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bi-directional Recurrent Neural Network (BRNN) models. We conducted experiments on three benchmark datasets, e.g., Flickr8K, Flickr30K, and MS COCO. Our … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 32 publications
0
5
0
Order By: Relevance
“…b. Accuracy, as a performance metric (2), assesses the proportion of correct predictions relative to the total predictions made by a model [20]. In essence, it quantifies the ratio of accurately predicted outcomes to the overall number of predictions, offering a straightforward indication of a model's correctness.…”
Section: Scoring Metricsmentioning
confidence: 99%
“…b. Accuracy, as a performance metric (2), assesses the proportion of correct predictions relative to the total predictions made by a model [20]. In essence, it quantifies the ratio of accurately predicted outcomes to the overall number of predictions, offering a straightforward indication of a model's correctness.…”
Section: Scoring Metricsmentioning
confidence: 99%
“…They are scalable, powerful, and versatile, ideal to address highly complex large machine learning tasks, such as video recommending, speech recognition services, and classifying images [18]. An interesting algorithm with an ANN component, such as the convolution neural network (CNN), focuses on image recognition, and recurrent neural network (RNN) [19].…”
Section: Related Workmentioning
confidence: 99%
“…The first thing we did is annotating those images which is a key technique used to create training data for computer vision. In order that the model perceives objects in their surroundings and automatically assign it a caption, annotated images are needed to train the model to learn to see an area full of objects as we do [21]. We used for that, bounding box technique that requires labellers to draw a box as close as possible to the edges of key objects within the image Figure 3, and stored the top-left and bottom-right points in the corresponding txt file respecting the syntax of class_id x y width height [22].…”
Section: Loading and Data Preparationmentioning
confidence: 99%