Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014
DOI: 10.3115/v1/d14-1032
|View full text |Cite
|
Sign up to set email alerts
|

Learning Abstract Concept Embeddings from Multi-Modal Data: Since You Probably Can't See What I Mean

Abstract: Models that acquire semantic representations from both linguistic and perceptual input are of interest to researchers in NLP because of the obvious parallels with human language learning. Performance advantages of the multi-modal approach over language-only models have been clearly established when models are required to learn concrete noun concepts. However, such concepts are comparatively rare in everyday language. In this work, we present a new means of extending the scope of multi-modal models to more comm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
54
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 72 publications
(56 citation statements)
references
References 25 publications
2
54
0
Order By: Relevance
“…While we have already presented two shuffling strategies in this work, one line of future work will investigate different possibilities of "blending in" words from two different vocabularies into pseudo-bilingual documents in a more structured and systematic manner. For instance, one approach to generating pseudo-training sentences for learning from textual and perceptual modalities has been recently introduced (Hill & Korhonen, 2014). However, it is not straightforward how to extend this approach to the generation of pseudo-bilingual training documents.…”
Section: Further Discussionmentioning
confidence: 99%
“…While we have already presented two shuffling strategies in this work, one line of future work will investigate different possibilities of "blending in" words from two different vocabularies into pseudo-bilingual documents in a more structured and systematic manner. For instance, one approach to generating pseudo-training sentences for learning from textual and perceptual modalities has been recently introduced (Hill & Korhonen, 2014). However, it is not straightforward how to extend this approach to the generation of pseudo-bilingual training documents.…”
Section: Further Discussionmentioning
confidence: 99%
“…For each concept (e.g., bear or eggplant ), we inspected the images in the development set and chose all visual attributes that applied. If an attribute was generally true for the concept, but the images did not provide enough evidence, the attribute was diet (34) eatscarrotsh eats carrots eatsnutsh eats nuts eatsgrassh eats grass shape/size (49) isroundh nevertheless chosen and labeled with <no_evidence>. For example, a plum has a pit, but most images in ImageNet show plums where only the outer part of the fruit is visible.…”
Section: Visual Attributesmentioning
confidence: 99%
“…Several models ( [15], [19], [22]) present extensions of Latent Dirich-let Allocation (LDA, [42]) where topic distributions are learned from words and other perceptual units treating them both as observed variables. Hill and Korhonen [34] extend the skip-gram network model [33] in a similar fashion, perceptual input is encoded verbally and treated as a word's linguistic context, whereas Lazaridou et al [40] modify skipgram's learning objective so that representations are trained to predict linguistic and visual features. In most cases the visual and textual modalities are decoupled and obtained independently, i.e., from text corpora and feature norms or image databases (but see [19] for an exception).…”
Section: Grounded Semantic Spacesmentioning
confidence: 99%
See 2 more Smart Citations