Proceedings of the 25th Conference on Computational Natural Language Learning 2021
DOI: 10.18653/v1/2021.conll-1.12
|View full text |Cite
|
Sign up to set email alerts
|

Learning Zero-Shot Multifaceted Visually Grounded Word Embeddings via Multi-Task Training

Abstract: Language grounding aims at linking the symbolic representation of language (e.g., words) into the rich perceptual knowledge of the outside world. The general approach is to embed both textual and visual information into a common space -the grounded space-confined by an explicit relationship. We argue that since concrete and abstract words are processed differently in the brain, such approaches sacrifice the abstract knowledge obtained from textual statistics in the process of acquiring perceptual information. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

5
3

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 46 publications
0
9
0
Order By: Relevance
“…However, the data efficiency gap applies more broadly to language learning. Recent studies evaluating contemporary Transformer-based models have largely re-ported negative results for the effect of multimodality on semantics (Shahmohammadi et al, 2022), commonsense reasoning (Yun et al, 2021), and learning biases (Kuribayashi, 2023). To the best of our knowledge, ours is the first work to perform targeted syntactic evaluation (Marvin and Linzen, 2018;Warstadt et al, 2020a; on multimodal models.…”
Section: Cognitively Oriented Approachesmentioning
confidence: 99%
“…However, the data efficiency gap applies more broadly to language learning. Recent studies evaluating contemporary Transformer-based models have largely re-ported negative results for the effect of multimodality on semantics (Shahmohammadi et al, 2022), commonsense reasoning (Yun et al, 2021), and learning biases (Kuribayashi, 2023). To the best of our knowledge, ours is the first work to perform targeted syntactic evaluation (Marvin and Linzen, 2018;Warstadt et al, 2020a; on multimodal models.…”
Section: Cognitively Oriented Approachesmentioning
confidence: 99%
“…That is, the meanings of words are solely based on other words without links to the outside world. Moreover, they also have practical problems; for example, because of the fundamental assumption of DSM, antonyms belonging to the same topical class (e.g., small and big) typically end up very close together in purely text-based vector spaces (see, e.g., Shahmohammadi et al, 2021, and references cited there). As a consequence, applications for, e.g., sentiment analysis, cannot well distinguish between "it was a good movie" and "it was a bad movie".…”
Section: Introductionmentioning
confidence: 99%

How direct is the link between words and images?

Shahmohammadi,
Heitmeier,
Shafaei-Bajestan
et al. 2023
ML
Self Cite
“…All these methods extract the semantic vectors purely from textual information. Other studies integrate visual information on top of that and create multi-modal embeddings (e.g., Shahmohammadi, Lensch, & Baayen, 2021).…”
Section: Introductionmentioning
confidence: 99%