Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413864
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Gumbel Attention Network for Text-based Person Search

Abstract: Text-based person search aims to retrieve the pedestrian images that best match a given textual description from gallery images. Previous methods utilize the soft-attention mechanism to infer the semantic alignments between the regions of image and the corresponding words in sentence. However, these methods may fuse the irrelevant multi-modality features together which cause matching redundancy problem. In this work, we propose a novel hierarchical Gumbel attention network for text-based person search via Gumb… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
47
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 77 publications
(47 citation statements)
references
References 29 publications
0
47
0
Order By: Relevance
“…Recently, many deep person re-identification methods improve the performance by exploring fine-grained and discriminative part features of person Fu et al, 2018;Wang et al, 2018a;Zheng et al, 2019]. These methods mainly divide the person images into multiple spatial bins, compute the part-level representations, and utilize extra loss functions for training each part.…”
Section: Part-based Re-id Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, many deep person re-identification methods improve the performance by exploring fine-grained and discriminative part features of person Fu et al, 2018;Wang et al, 2018a;Zheng et al, 2019]. These methods mainly divide the person images into multiple spatial bins, compute the part-level representations, and utilize extra loss functions for training each part.…”
Section: Part-based Re-id Methodsmentioning
confidence: 99%
“…Text-based person search is introduced by Li et al [Li et al, 2017b], they collect a large-scale person description dataset, CUHK-PEDES and design a Recurrent Neural Network with Gated Neural Attention mechanism model (GNA-RNN) for this task. Most of the following works [Li et al, 2017a;Chen et al, 2018b;Chen et al, 2018a;Jing et al, 2020;Zheng et al, 2020a] adopt the cross-modality attention mechanism to attend all the image regions of images and the corresponding words in textual description, the core idea is to obtain weighted matching between image and text for alleviating the irrelevant matching. These methods are inefficient and increase the complexity of computation.…”
Section: Text-based Person Searchmentioning
confidence: 99%
See 1 more Smart Citation
“…However, it requires an additional human parsing network to be trained to obtain the attributes. In contrast to LSTM based networks for textual representations, BERT has been used in [15] and gives promising performance.…”
Section: Attention Based Re-id Methodsmentioning
confidence: 99%
“…Due to the rapid development of deep neural networks [1,2,3,4], visual surface inspection [5,6] has attracted increasing attention as an important technology in many intelligent industrial applications. Visual surface inspection aims to detect the abnormal regions on the surface of material using visual images.…”
Section: Introductionmentioning
confidence: 99%