2016
DOI: 10.17706/ijcee.2016.8.1.31-43
|View full text |Cite
|
Sign up to set email alerts
|

Fashion Meets Computer Vision and NLP at e-Commerce Search

Abstract: Abstract:In this paper, we focus on cross-modal (visual and textual) e-commerce search within the fashion domain. Particularly, we investigate two tasks: 1) given a query image, we retrieve textual descriptions that correspond to the visual attributes in the query; and 2) given a textual query that may express an interest in specific visual product characteristics, we retrieve relevant images that exhibit the required visual attributes. To this end, we introduce a new dataset that consists of 53,689 images cou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 12 publications
(14 citation statements)
references
References 24 publications
(25 reference statements)
0
14
0
Order By: Relevance
“…Image search aims to retrieve relevant images to a textual or visual query. In [18], Zoghbi et al experiment with canonical correlation analysis (CCA) and bilingual latent Dirichlet allocation (BiLDA) to realize cross-modal fashion search from a textual query to images and from an image query to text. In [12], cross-modal search of fashion items is achieved with intermodal representations for visual and textual fashion attributes inferred with a neural network alignment model.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Image search aims to retrieve relevant images to a textual or visual query. In [18], Zoghbi et al experiment with canonical correlation analysis (CCA) and bilingual latent Dirichlet allocation (BiLDA) to realize cross-modal fashion search from a textual query to images and from an image query to text. In [12], cross-modal search of fashion items is achieved with intermodal representations for visual and textual fashion attributes inferred with a neural network alignment model.…”
Section: Related Workmentioning
confidence: 99%
“…E-commerce product descriptions tend to be noisy and incomplete. To acquire the textual fashion attributes, we follow the approach of Zoghbi et al [18] and filter the product descriptions with the glossary of the online clothing shop Zappos 2 . This process only retains fashion-related phrases and removes most noise.…”
Section: Visual and Textual Attribute Detectionmentioning
confidence: 99%
See 2 more Smart Citations
“…In both cases the final aim is to recognize the image content in terms of depicted objects [7] or scene [5]. Previous work relies mostly on content-based strategies that either predict the text from the images by training a model of their relationship [16,17,18,19], or propagates tags through a k-nearest neighbor method [4,15]. In these studies, the analysis is generally performed on data collections where the ground truth image annotations are well defined with a large consensus among of users or experts.…”
Section: Related Workmentioning
confidence: 99%