Image Search With Text Feedback by Visiolinguistic Attention Learning

Chen, Yanbei; Gong, Shaogang; Bazzani, Loris

doi:10.1109/cvpr42600.2020.00307

Cited by 140 publications

(116 citation statements)

References 63 publications

Supporting

Mentioning

116

Contrasting

Order By: Relevance

“…In order to verify the effectiveness of proposed methods, we compare state-of-the-art methods [8,26,7,27,23,9] on three datasets and cite performance scores from the original papers. However, we notice that CNN backbone used by these methods vary, so we train our models with ResNet18 or MobileNet for a fair comparison.…”

Section: Comparison With the State-of-the-artmentioning

confidence: 99%

“…In [8], Vo et al learn a gating connection to retain the query image feature, and additionally employ a residual connection to fuse images and text. More recently, Chen et al [9] Fig. 1.…”

Section: Introductionmentioning

confidence: 95%

“…The main challenge for this task is how to learn a composite representation that captures both visual and semantic information. Previous studies have provided some approaches [7,8,9,10]. For instance, Santoro et al [7] first concatenate text features and image features simply, and a multilayer perceptron is further employed to fuse the cross-modal data.…”

Section: Introductionmentioning

confidence: 99%

“…However, the above works have not fully utilized the multi-level feature. Inspired by [9] that explores multiple multi-level features from convolutional neural networks, we go further in this direction by jointly exploring loworder features and high-order features on the basis of multi-level features. Besides, adversarial learning [11] has been proved to be effective in image generation (e.g., image-to-image translation [12,13], text-to-image generation [14]), as well as facilitating the fusion of image and text features [15].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Multi-Order Adversarial Representation Learning for Composed Query Image Retrieval

Chen

Dong

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This paper targets at a task of composed query image retrieval. Given a composed query consists of a reference image and modification text, the task aims to retrieve images which are generally similar to the reference image but differ according to the given modification text. The task is challenging, due to the complexity of the composed query and cross-modality characteristics between the query and candidate images. The common paradigm for the task is to first obtain fused feature of the reference image and the text, and further project them into a common embedding space with candidate images. However, the majority of works usually only aim for the representation of high level, ignoring the low-level representation which may be complementary to the high-level representation. So this paper proposes a new Multi-order Adversarial Network (MAN) which uses multilevel representations and simultaneously explores their low-order and high-order interactions, obtaining low-order and high-order features. The low-order features reflect the pattern of itself and highorder features contains the interaction between features. Moreover, we further introduce an adversarial module to constrain the fusion of the reference image and the text. Extensive experiments on three datasets verify the effectiveness of our MAN and also demonstrate its state-of-the-art performance.

show abstract

Section: Comparison With the State-of-the-artmentioning

confidence: 99%

“…In [8], Vo et al learn a gating connection to retain the query image feature, and additionally employ a residual connection to fuse images and text. More recently, Chen et al [9] Fig. 1.…”

Section: Introductionmentioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multi-Order Adversarial Representation Learning for Composed Query Image Retrieval

Chen

Dong

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Cross-modal retrieval is a classical task at the intersection between computer vision and natural language processing, and has been widely explored [1,2,3,4]. Recently, with the increasing popularity of e-commerce platforms [5,6,7,8], language-based product image retrieval attracts increasing attention [9,10,11]. As exemplified in Fig.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Similarity Learning for Language-Based Product Image Retrieval

Liu

Dong

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This paper aims for the language-based product image retrieval task. The majority of previous works have made significant progress by designing network structure, similarity measurement, and loss function. However, they typically perform vision-text matching at certain granularity regardless of the intrinsic multiple granularities of images. In this paper, we focus on the cross-modal similarity measurement, and propose a novel Hierarchical Similarity Learning (HSL) network. HSL first learns multi-level representations of input data by stacked encoders, and object-granularity similarity and imagegranularity similarity are computed at each level. All the similarities are combined as the final hierarchical cross-modal similarity. Experiments on a large-scale product retrieval dataset demonstrate the effectiveness of our proposed method. Code and data are available at https://github.com/liufh1/hsl.

show abstract

Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval

Chen

Bazzani

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Image Search With Text Feedback by Visiolinguistic Attention Learning

Cited by 140 publications

References 63 publications

Multi-Order Adversarial Representation Learning for Composed Query Image Retrieval

Multi-Order Adversarial Representation Learning for Composed Query Image Retrieval

Hierarchical Similarity Learning for Language-Based Product Image Retrieval

Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval

Contact Info

Product

Resources

About