Proceedings of the 21st ACM International Conference on Multimedia 2013
DOI: 10.1145/2502081.2502112
|View full text |Cite
|
Sign up to set email alerts
|

Online multimodal deep similarity learning with application to image retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
70
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 152 publications
(70 citation statements)
references
References 35 publications
0
70
0
Order By: Relevance
“…In doing so, we presented a simple and easy-to-implement architecture for acquiring semantic representations of both types of concept from linguistic and perceptual input. While neuro-probabilistic language models have been applied to the problem of multi-modal representation learning previously (Srivastava and Salakhutdinov, 2012;Wu et al, 2013;Silberer and Lapata, 2014) our model and experiments develop this work in several important ways. First, we address the problem of learning abstract concepts.…”
Section: Discussionmentioning
confidence: 99%
“…In doing so, we presented a simple and easy-to-implement architecture for acquiring semantic representations of both types of concept from linguistic and perceptual input. While neuro-probabilistic language models have been applied to the problem of multi-modal representation learning previously (Srivastava and Salakhutdinov, 2012;Wu et al, 2013;Silberer and Lapata, 2014) our model and experiments develop this work in several important ways. First, we address the problem of learning abstract concepts.…”
Section: Discussionmentioning
confidence: 99%
“…A fusion representation is learned simultaneously and used for the final prediction. Wu et al [17] use mutlimodal deep neural networks to learn a combined non-linear similarity function. They trained multiple deep denoising autoencoders for different low-level features in an unsupervised manner.…”
Section: Feature Transformation With Deep Neural Networkmentioning
confidence: 99%
“…Moreover, deep learning based multi-modal fusion methods have also been proposed recently [14], [15], [16]. For example, in [17], the distance metric between different modalities is learned by deep neural networks. The methods mentioned above are focused on the problem of how to utilize multiple features more effectively.…”
Section: Introductionmentioning
confidence: 99%
“…Previous work has focused on projecting words and images onto a common space using a variety of methods including deep and restricted Boltzman machines [43], autoencoders [44], and recursive neural networks [45]. Similar methods were employed to combine other modalities such as speech and video or images ( [24], [43]).…”
Section: Multimodal Deep Learningmentioning
confidence: 99%