2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00517
|View full text |Cite
|
Sign up to set email alerts
|

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
83
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 107 publications
(83 citation statements)
references
References 36 publications
0
83
0
Order By: Relevance
“…Recent research suggests that CLIP's compositional capabilities are limited. 75 . As shown by our results, restricted domains allow for direct manipulation, without the risk of confounding; indeed, restricted domains may be easier to explore but further investigation is needed to confirm compositional capabilities.…”
Section: Grounding and Compositionalitymentioning
confidence: 99%
“…Recent research suggests that CLIP's compositional capabilities are limited. 75 . As shown by our results, restricted domains allow for direct manipulation, without the risk of confounding; indeed, restricted domains may be easier to explore but further investigation is needed to confirm compositional capabilities.…”
Section: Grounding and Compositionalitymentioning
confidence: 99%
“…Visual-linguistic compositionality has been explored for image-language models [66,99,120,126]. The compositional nature of language allows the evaluation of various aspects: meaning change due to change in word order [99], relationship between objects [126], systematicity and productivity [66], etc.…”
Section: Time In Visionmentioning
confidence: 99%
“…Recent research has started to probe VLMs' for such information. Thrush et al (2022) proposed Winoground, a dataset of hand-curated test cases that document a clear lack of compositional and pragmatic understanding in VLMs. The dataset is high quality but relatively small scale; its 400 test cases cover a wide range of linguistic phenomena (e.g., relation, attribution, pragmatics, world knowledge), making it hard to render statistically significant results about relational and attributive abilities.…”
Section: Attribution Relation and Order (Aro) Benchmark: When Do Mode...mentioning
confidence: 99%
“…Parcalabescu et al (2021) show that VLMs have difficulties in counting objects in images. In terms of the evaluation part of our paper, Winoground (Thrush et al, 2022) presents the nearest neighbor to our work. Winoground is a carefully curated dataset that aims to evaluate compositional and pragmatics language understanding of VLMs.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation