2014
DOI: 10.1613/jair.4135
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Distributional Semantics

Abstract: Distributional semantic models derive computational representations of word meaning from the patterns of co-occurrence of words in text. Such models have been a success story of computational linguistics, being able to provide reliable estimates of semantic relatedness for the many semantic tasks requiring them. However, distributional models extract meaning information exclusively from text, which is an extremely impoverished basis compared to the rich perceptual sources that ground human semantic knowledge. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

14
727
0
2

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 765 publications
(743 citation statements)
references
References 88 publications
14
727
0
2
Order By: Relevance
“…The second model is a deep learning-based variant of kCCA (DCCA, [82]) which computes representations by passing the two views through a deep network which is fine-tuned to maximize the total correlation of the output layers. The third model emulates Bruni et al's [37] integration mechanism. Specifically, we concatenated the textual and visual vectors and projected them onto a lower dimensional latent space using singular value decomposition, a mathematical technique for reducing the dimensionality of semantic spaces [83].…”
Section: Comparison Modelsmentioning
confidence: 97%
See 3 more Smart Citations
“…The second model is a deep learning-based variant of kCCA (DCCA, [82]) which computes representations by passing the two views through a deep network which is fine-tuned to maximize the total correlation of the output layers. The third model emulates Bruni et al's [37] integration mechanism. Specifically, we concatenated the textual and visual vectors and projected them onto a lower dimensional latent space using singular value decomposition, a mathematical technique for reducing the dimensionality of semantic spaces [83].…”
Section: Comparison Modelsmentioning
confidence: 97%
“…Other models focus on the visual modality and exploit image databases, such as ImageNet [35] or ESP [36]. A few approaches ( [19], [37]) use visual words which they derive by clustering SIFT descriptors [38] extracted from images, or combine both feature norms and visual words [22]. Drawing inspiration from the successful application of attribute classifiers in object recognition, Silberer et al [21] show that automatically predicted visual attributes from images can act as substitutes for feature norms without any critical information loss.…”
Section: Grounded Semantic Spacesmentioning
confidence: 99%
See 2 more Smart Citations
“…CAT assigns each word the concatenation of its CL, WL, and FT embeddings, i.e., performs late fusion (Bruni et al, 2014). As the individual models produce vectors of different average magnitudes, we rescale the embeddings produced by individual models prior to this concatenation, such that, after rescaling, each constituent model has the same average vector magnitude, when averaged over all words present in the training data.…”
Section: Fusion Modelmentioning
confidence: 99%