2023
DOI: 10.3390/s23031286
|View full text |Cite
|
Sign up to set email alerts
|

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates

Abstract: Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still far from being solved. To overcome the limitations of previous approaches, a transformer-based captioning model was designed with the integration of external textual memory that could be access… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 53 publications
0
9
0
Order By: Relevance
“…This demonstrates the success of training the model utilizing image attributes in addition to the image features and linguistic features and represents an improvement of more than 30 BLEU points over the model that does not use attributes and other related models. For example, our model outperforms [22] with more than 40 BLEU points and [23] with more than 60 BLEU points.…”
Section: Resultsmentioning
confidence: 89%
See 2 more Smart Citations
“…This demonstrates the success of training the model utilizing image attributes in addition to the image features and linguistic features and represents an improvement of more than 30 BLEU points over the model that does not use attributes and other related models. For example, our model outperforms [22] with more than 40 BLEU points and [23] with more than 60 BLEU points.…”
Section: Resultsmentioning
confidence: 89%
“…For the second category, it considers the models developed for the fashion domain which includes [20] that explored training the model using diversity datasets [1] which proposed a smaller 5-layer Convolutional Neural Network (CNN-5) to extract image features, Semantic Rewards guided Fashion Captioning (SRFC) [21]. It also includes [22] which explored taking the semantic attributes from the users, and [23] which developed a transformer-based model. As can be observed, according to BLEU, the proposed approach performs better than all of the methods that were compared.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, two papers focus on the problem of outward appearance and fashion. In particular, Fontanini et al [ 12 ] propose a method for transferring clothing styles across images of people, while Moratelli et al [ 13 ] propose an image captioning approach for fashion retrieval applications.…”
Section: Overview Of Contributionmentioning
confidence: 99%
“…With the development of Artificial Intelligence, image captioning techniques have been increasingly applied in various fields. Such as medicine [5,28], fashion and e-commerce [19], aided industry [36], and tourism [4]. Moreover, this technology shows immense potential in traffic applications.…”
Section: Introductionmentioning
confidence: 99%