2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01756
|View full text |Cite
|
Sign up to set email alerts
|

On Guiding Visual Attention with Language Specification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…There is a line of recent work aiming to fix vision classifiers with language inputs. (Petryk et al, 2022) uses attention maps from a pre-trained CLIP to supervise a CNN classifier's spatial attention. (Zhang et al, 2023) probes a vision classifier trained on the joint vision-language embedding space of CLIP using language embeddings of attributes, identifies the attributes causing most failures, and generates a large set of natural language inputs with the influential attributes to rectify the model.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There is a line of recent work aiming to fix vision classifiers with language inputs. (Petryk et al, 2022) uses attention maps from a pre-trained CLIP to supervise a CNN classifier's spatial attention. (Zhang et al, 2023) probes a vision classifier trained on the joint vision-language embedding space of CLIP using language embeddings of attributes, identifies the attributes causing most failures, and generates a large set of natural language inputs with the influential attributes to rectify the model.…”
Section: Related Workmentioning
confidence: 99%
“…While the multi-modal alignment increases the expectations about model reliability due to better grounding and larger availability of data in general, these models are still not immune to fundamental learning problems such as dealing with spurious correlations (Bommasani et al, 2021;Moayeri et al, 2022;Petryk et al, 2022;Agarwal et al, 2021). Therefore, when such models are used as a backbone to solve application-oriented tasks on a given domain, existing spurious correlations specific to that domain or the finetuning data that comes with it, may resurface in ways that are harmful to end users.…”
Section: Introductionmentioning
confidence: 99%