Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021) 2021
DOI: 10.18653/v1/2021.repl4nlp-1.16
|View full text |Cite
|
Sign up to set email alerts
|

Probing Cross-Modal Representations in Multi-Step Relational Reasoning

Abstract: We investigate the representations learned by vision and language models in tasks that require relational reasoning. Focusing on the problem of assessing the relative size of objects in abstract visual contexts, we analyse both one-step and two-step reasoning. For the latter, we construct a new dataset of threeimage scenes and define a task that requires reasoning at the level of the individual images and across images in a scene. We probe the learned model representations using diagnostic classifiers. Our exp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 29 publications
0
2
0
Order By: Relevance
“…Moreover, our study can inform computational work on question generation in the domain of natural language processing (see, e.g., Wang and Lake, 2021 ). Asking (informative) questions could also be of crucial importance, for example, to multimodal AI models asked to provide a correct answer to a question regarding the content of an image (Antol et al., 2015 ; Johnson et al., 2017 ) or the abstract relation tying various scenes depicting similar objects (Parfenova, Elliott, Fernández, & Pezzelle, 2021 ). This possibility could reduce the uncertainty of a model—when the input question is vague, ambiguous, or can be misinterpreted—and drive its decisions toward the correct output.…”
Section: Discussionmentioning
confidence: 99%
“…Moreover, our study can inform computational work on question generation in the domain of natural language processing (see, e.g., Wang and Lake, 2021 ). Asking (informative) questions could also be of crucial importance, for example, to multimodal AI models asked to provide a correct answer to a question regarding the content of an image (Antol et al., 2015 ; Johnson et al., 2017 ) or the abstract relation tying various scenes depicting similar objects (Parfenova, Elliott, Fernández, & Pezzelle, 2021 ). This possibility could reduce the uncertainty of a model—when the input question is vague, ambiguous, or can be misinterpreted—and drive its decisions toward the correct output.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, Parfenova et al (2021) recently created a dataset of three-image scenes to probe two-step reasoning.…”
Section: Related Workmentioning
confidence: 99%