2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018
DOI: 10.1109/cvprw.2018.00055
|View full text |Cite
|
Sign up to set email alerts
|

Attention in Multimodal Neural Networks for Person Re-identification

Abstract: In spite of increasing interest from the research community, person re-identification remains an unsolved problem. Correctly deciding on a true match by comparing images of a person, captured by several cameras, requires extraction of discriminative features to counter challenges such as changes in lighting, viewpoint and occlusion. Besides devising novel feature descriptors, the setup can be changed to capture persons from an overhead viewpoint rather than a horizontal. Furthermore, additional modalities can … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(26 citation statements)
references
References 31 publications
0
26
0
Order By: Relevance
“…In RGB-D-CNN, Lejbolle et al [29] started with a two-flow Convolutional Neural Network (CNN) (one for RGB and one for depth) and a final fusion layer. Then they improved this approach with a multimodal attention network called MAT [30], adding an attention module to extract local and discriminative features that were fused with globally extracted features. In another work, Lejbolle et al [31] presented a SLATT network with two types of attention modules (one spatial and one layer-wise).…”
Section: Resultsmentioning
confidence: 99%
“…In RGB-D-CNN, Lejbolle et al [29] started with a two-flow Convolutional Neural Network (CNN) (one for RGB and one for depth) and a final fusion layer. Then they improved this approach with a multimodal attention network called MAT [30], adding an attention module to extract local and discriminative features that were fused with globally extracted features. In another work, Lejbolle et al [31] presented a SLATT network with two types of attention modules (one spatial and one layer-wise).…”
Section: Resultsmentioning
confidence: 99%
“…In [59], the authors started with a two-flow convolutional neural network (CNN) (one for RGB and one for depth) and a final fusion layer. They improved on this approach with a multi-modal attention network [60], adding an attention module to extract local and discriminative features that were fused with globally extracted features. In another work, Lejbolle et al [61] presented a SLATT network with two types of attention modules (one spatial and one layer-wise).…”
Section: Person Re-identificationmentioning
confidence: 99%
“…To our knowledge, the only previous work to consider multimodal attention is the Multimodal ATtention network (MAT) [28]. In this work, spatial attention weights are calculated for different layers of a CNN based on fusion of features from different modalities.…”
Section: Attention In Person Re-identificationmentioning
confidence: 99%
“…This will be referred to as spatial attention, which is applied in [28] to determine the importance of spatial locations at different layers of a neural network based on fusion of RGB and depth features. Since different layers of a CNN produce features at different abstraction levels [29], features produced by spatial attention modules represent local context information at different abstraction levels.…”
Section: Introductionmentioning
confidence: 99%