2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00307
|View full text |Cite
|
Sign up to set email alerts
|

Image Search With Text Feedback by Visiolinguistic Attention Learning

Abstract: Image search with text feedback has promising impacts in various real-world applications, such as e-commerce and internet search. Given a reference image and text feedback from user, the goal is to retrieve images that not only resemble the input image, but also change certain aspects in accordance with the given text. This is a challenging task as it requires the synergistic understanding of both image and text. In this work, we tackle this task by a novel Visiolinguistic Attention Learning (VAL) framework. S… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
116
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 140 publications
(116 citation statements)
references
References 63 publications
0
116
0
Order By: Relevance
“…In order to verify the effectiveness of proposed methods, we compare state-of-the-art methods [8,26,7,27,23,9] on three datasets and cite performance scores from the original papers. However, we notice that CNN backbone used by these methods vary, so we train our models with ResNet18 or MobileNet for a fair comparison.…”
Section: Comparison With the State-of-the-artmentioning
confidence: 99%
See 3 more Smart Citations
“…In order to verify the effectiveness of proposed methods, we compare state-of-the-art methods [8,26,7,27,23,9] on three datasets and cite performance scores from the original papers. However, we notice that CNN backbone used by these methods vary, so we train our models with ResNet18 or MobileNet for a fair comparison.…”
Section: Comparison With the State-of-the-artmentioning
confidence: 99%
“…In [8], Vo et al learn a gating connection to retain the query image feature, and additionally employ a residual connection to fuse images and text. More recently, Chen et al [9] Fig. 1.…”
Section: Introductionmentioning
confidence: 95%
See 2 more Smart Citations
“…Cross-modal retrieval is a classical task at the intersection between computer vision and natural language processing, and has been widely explored [1,2,3,4]. Recently, with the increasing popularity of e-commerce platforms [5,6,7,8], language-based product image retrieval attracts increasing attention [9,10,11]. As exemplified in Fig.…”
Section: Introductionmentioning
confidence: 99%