Proceedings of the 30th ACM International Conference on Multimedia 2022
DOI: 10.1145/3503161.3548057
|View full text |Cite
|
Sign up to set email alerts
|

CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval

Abstract: Given a natural language description, text-based person retrieval aims to identify images of a target person from a large-scale person image database. Existing methods generally face a color overreliance problem, which means that the models rely heavily on color information when matching cross-modal data. Indeed, color information is an important decision-making accordance for retrieval, but the over-reliance on color would distract the model from other key clues (e.g. texture information, structural informati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 52 publications
(8 citation statements)
references
References 35 publications
0
8
0
Order By: Relevance
“…By inputting the augmented grayscale image into the model, the model is compelled to focus on other information like texture and shape during training. While color information is known to be critical in TBPS (Wu et al 2021;Wang et al 2022c), these empirical results suggest that other information, apart from color, is also valuable for person retrieval. Conversely, GaussianBlur, which blurs fine-grained details, significantly degrades performance.…”
Section: Ablations Of Data Augmentationmentioning
confidence: 79%
“…By inputting the augmented grayscale image into the model, the model is compelled to focus on other information like texture and shape during training. While color information is known to be critical in TBPS (Wu et al 2021;Wang et al 2022c), these empirical results suggest that other information, apart from color, is also valuable for person retrieval. Conversely, GaussianBlur, which blurs fine-grained details, significantly degrades performance.…”
Section: Ablations Of Data Augmentationmentioning
confidence: 79%
“…We compare our proposed method CMAP with recent stateof-the-art methods, including: (1) Traditional pre-training methods that improve the accuracy of cross-modal matches by attention mechanisms and additional informative cues, such as LGUR (Shao et al 2022), LBUL (Wang et al 2022b); Several methods DSSL (Zhu et al 2021), AXM-Net (Farooq et al 2022), ISANet (Yan et al 2022b), CAIBC (Wang et al 2022a), RKT (Wu et al 2023) and SRCF (Suo et al 2022) propose some simple strategies to achieve distinct semantics for proper alignments.…”
Section: Overall Comparsion Resultsmentioning
confidence: 99%
“…The participants were expected to confirm if the images matched their respective labels. The identification model demanded of participants is to identification beyond color, as used in Wang et al (2022).…”
Section: Methodsmentioning
confidence: 99%