2020
DOI: 10.1109/tip.2020.2984883
|View full text |Cite
|
Sign up to set email alerts
|

Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments

Abstract: Description-based person re-identification (Re-id) is an important task in video surveillance that requires discriminative cross-modal representations to distinguish different people. It is difficult to directly measure the similarity between images and descriptions due to the modality heterogeneity (the crossmodal problem). And all samples belonging to a single category (the fine-grained problem) makes this task even harder than the conventional image-description matching task. In this paper, we propose a Mul… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
81
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 146 publications
(82 citation statements)
references
References 51 publications
1
81
0
Order By: Relevance
“…We compare the results with feature embedding ( [4, 7, 28]) and attention based methods ( [10,13,29,30]) in Table 2. The results show that our HAITA-Net achieves better performance.…”
Section: Performance Comparisonmentioning
confidence: 99%
See 1 more Smart Citation
“…We compare the results with feature embedding ( [4, 7, 28]) and attention based methods ( [10,13,29,30]) in Table 2. The results show that our HAITA-Net achieves better performance.…”
Section: Performance Comparisonmentioning
confidence: 99%
“…The results show that our HAITA-Net achieves better performance. Although MIA [29] and A-GANet [10] also learn the cross-modal representations by global or local associations, they weakly incorporate the importance of the informative word. Further, compared to A-GANet, we observe that for higher CMC ranks our model has significantly better performance.…”
Section: Performance Comparisonmentioning
confidence: 99%
“…ResNet50 [127] LSTM [130] GLIA [82] CUHK-PEDES [85] Top-1 acc. ResNet-50 [127] Bi-GRU [145,146] MIA [135] CUHK-PEDES [85] Recall@1 (48.00%) Zhang et al [136] 2018…”
Section: Natural Language Description-based Person Retrievalmentioning
confidence: 99%
“…The GLIA approach gains a significant boost to the top-1 accuracy from the baselines and achieves 43. Multi-granularity Image-text Alignments (MIA) framework [135] adopts a multiple granularities (i.e., global-global, global-local, and local-local alignments) based approach for better similarity evaluations between text and image. The global context of image and description matches global-global granularity.…”
Section: Natural Language Description-based Person Retrievalmentioning
confidence: 99%
“…Furthermore, Li et al [21] have adopted deep filter pairing neural network for person re-identification, and Wang et al [40] have investigated a joint learning framework for re-id while unifying Single-Image Representation (SIR) and Cross-Image Representation (CIR) via convolutional neural networks. More recently, a multi-granularity image-text alignment has been proposed for person re-identification [31].…”
Section: Introductionmentioning
confidence: 99%