Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019) 2019
DOI: 10.18653/v1/s19-1013
|View full text |Cite
|
Sign up to set email alerts
|

Deconstructing multimodality: visual properties and visual context in human semantic processing

Abstract: Multimodal semantic models that extend linguistic representations with additional perceptual input have proved successful in a range of natural language processing (NLP) tasks. Recent research has successfully used neural methods to automatically create visual representations for words. However, these works have extracted visual features from complete images, and have not examined how different kinds of visual information impact performance. In contrast, we construct multimodal models that differentiate betwee… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 23 publications
0
5
0
Order By: Relevance
“…Based on the concepts and experimental results of psychology and neuroscience, there have been attempts to solve the problem of neural networks. In particular, the theory of mind, inductive bias, and intrinsic motivation were effective methods in embodied visual language interaction [190][191][192].…”
Section: ) Multi-agent Optimizationmentioning
confidence: 99%
“…Based on the concepts and experimental results of psychology and neuroscience, there have been attempts to solve the problem of neural networks. In particular, the theory of mind, inductive bias, and intrinsic motivation were effective methods in embodied visual language interaction [190][191][192].…”
Section: ) Multi-agent Optimizationmentioning
confidence: 99%
“…More recently, some work has explored the quality of representations learned from images only (Lüddecke et al, 2019) or by combining language, vision, and emojis (Rotaru and Vigliocco, 2020). In parallel, new evaluation methods based, for example, on decoding brain activity (Davis et al, 2019) or success on tasks such as image retrieval (Kottur et al, 2016) have been proposed. This mass of studies has overall demonstrated the effectiveness of multimodal representations in approximating human semantic intuitions better than purely textual ones.…”
Section: Evaluating Multimodal Representationsmentioning
confidence: 99%
“…This dataset includes at least one label for all but four of the 21,841 synsets used in ImageNet. ImageNet has been used to solve tasks in the intersection of language and vision research Chen et al, 2019;Davis et al, 2019;Vempala and Preot, 2019). For Arabic computer vision, limited related work currently exists.…”
Section: ‫كلب‬mentioning
confidence: 99%