2022
DOI: 10.48550/arxiv.2211.07394
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

Abstract: We investigate composed image retrieval with text feedback. Users gradually look for the target of interest by moving from coarse to fine-grained feedback. However, existing methods merely focus on the latter, i.e., fine-grained search, by harnessing positive and negative pairs during training. This pair-based paradigm only considers the oneto-one distance between a pair of specific points, which is not aligned with the one-to-many coarse-grained retrieval process and compromises the recall rate.In an attempt … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 28 publications
0
4
0
Order By: Relevance
“…These approaches deviate from the conventional feedback mechanisms and provide a fresh outlook on evaluating fashion items. Chen et al [50] propose a method that utilizes text feedbacks to improve image retrieval accuracy by incorporating multi-grained uncertainty regularization to handle the complex relationship between the image and text features. Reranking can be performed using reinforcement learning or deep metric learning, depending on the feedback received in a natural language format.…”
Section: Re-ranking Methodsmentioning
confidence: 99%
“…These approaches deviate from the conventional feedback mechanisms and provide a fresh outlook on evaluating fashion items. Chen et al [50] propose a method that utilizes text feedbacks to improve image retrieval accuracy by incorporating multi-grained uncertainty regularization to handle the complex relationship between the image and text features. Reranking can be performed using reinforcement learning or deep metric learning, depending on the feedback received in a natural language format.…”
Section: Re-ranking Methodsmentioning
confidence: 99%
“…Existing methods can be categorized into two main categories based on the association between visual and text encoders. The first category involves utilizing two weakly associated encoders to represent the textual and visual information, and then combining these representations for tar- get retrieval (Lee, Kim, and Han 2021;Kim et al 2021;Chen et al 2022). However, these methods without using pre-trained encoders naturally face the issue of insufficient performance: aligning two modalities with limited fashion data is challenging, especially for image regions or modifiers that are difficult to cover within the limited data.…”
Section: Introductionmentioning
confidence: 99%
“…In Figure 2, we visualize the gains from employing pre-trained symmetric encoders and the accompanied Reference Dominance Phenomenon. We select MGUR (Chen et al 2022) and Comquery (Xu et al 2023), two representative and open-source methods using weakly associated encoders, and adapt them with symmetric encoders. Figure 2(a) depicts the evident performance improvement from pretrained symmetric encoders.…”
Section: Introductionmentioning
confidence: 99%
“…4) Determination Uncertainty. Because of the rarity of the traffic accident data, Vision-TAD and Vision-TAA own the natural Algebraic Uncertainty [17], [18], [19] (i.e., the determination concerns with the uncertainty of latent variables, such as observation noise and data insufficiency) and Epistemic Uncertainty [20] (i.e., model generalization).…”
Section: Introductionmentioning
confidence: 99%