2018
DOI: 10.14569/ijacsa.2018.090338
|View full text |Cite
|
Sign up to set email alerts
|

An Effective Automatic Image Annotation Model Via Attention Model and Data Equilibrium

Abstract: Nowadays, a huge number of images are available. However, retrieving a required image for an ordinary user is a challenging task in computer vision systems. During the past two decades, many types of research have been introduced to improve the performance of the automatic annotation of images, which are traditionally focused on content-based image retrieval. Although, recent research demonstrates that there is a semantic gap between content-based image retrieval and image semantics understandable by humans. A… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 42 publications
0
4
0
Order By: Relevance
“…Furthermore, the F1 score obtained by our method is at least 10% higher than that obtained by other recent studies such as GCN (2020) [73], SSL-AWF (2021) [81], and MVRSC (2021) [82]. Now, if we look at the scenario of 374 concepts, we can see that our proposed method has surpassed all other methods except for that of Vatani et al (2020) [85]. However, if we consider the method of Vatani et al in terms of N+, we can see that our method outperforms it by eight concepts.…”
Section: Scenario 2: Comparing Our Methods To the State Of The Artcontrasting
confidence: 42%
“…Furthermore, the F1 score obtained by our method is at least 10% higher than that obtained by other recent studies such as GCN (2020) [73], SSL-AWF (2021) [81], and MVRSC (2021) [82]. Now, if we look at the scenario of 374 concepts, we can see that our proposed method has surpassed all other methods except for that of Vatani et al (2020) [85]. However, if we consider the method of Vatani et al in terms of N+, we can see that our method outperforms it by eight concepts.…”
Section: Scenario 2: Comparing Our Methods To the State Of The Artcontrasting
confidence: 42%
“…Ivasic-Kos et al [12] proposed a framework based on semantic and discriminative classification. Vatani et al [13] come up with a deep learning model with feature extraction method using three phases: a extraction of feature, a generation of tag, and an annotating an image. This model attempts to unravel the problem of imbalanced data in image annotation.…”
Section: Nearest Neighbor-based Modelmentioning
confidence: 99%
“…Few famous novels are taken as sample and trained LSTM to generate a long sequence of sentence which is similar writing of the novel. Amir Vatani et al [11] proposed image annotation with low-level and high-level feature extraction, tag generation and annotation which optimizes the common issue in image annotation called as semantic gap. Jacobian matrix creation in visual control systems is a challenging one which leads to give opportunity for estimation error, observation error and filter error.…”
Section: Related Workmentioning
confidence: 99%
“…Later deep learning model has been introduced and it is proven in various research papers [42,43,44] as better model because the network learns itself. Recently, deep learning methods are rocking in this research area and itslowly transforms the annotation into captioning.In deep learning, annotation models can be classified as annotating with tags [10,11,14,20,29,29,34], finding sequence of words [5,9,13,15,22,24,25,27],captioning [17,18,19,26,21]and classification [6,7,14,16].In the first model, assigning tags to images based on extracted features or objects in the image. Related tags are identified and make it as a sequence of tags whereas the third model combined related tags with Natural Language processing as meaningful sentence called as captioning.In deep learning annotation models, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) perform well to encode features of image and decode the features into the natural language representation [2].Later Long-Short-Term-Memory (LSTM) has been introduced to conserve the dependency for future reference and good in natural language generating [13] which RNN can't.…”
Section: Introductionmentioning
confidence: 99%