Cross-modal Retrieval with Correspondence Autoencoder

Feng, Fan; Wang, Xiaojie; Li, Ruifan

doi:10.1145/2647868.2654902

Cited by 523 publications

(442 citation statements)

References 20 publications

Supporting

Mentioning

436

Contrasting

Unclassified

Order By: Relevance

“…Existing methods [5,9,10] mostly combine inter-media constraints (such as correlation constraints [10]) and intra-media constraints (such as semantic [5] or reconstruction constraints [9]) to train their models for building common representations. Since inter-media and intra-media constraints both need to be optimized as objective functions, there is a complex optimization problem limiting the performance of cross-media retrieval.…”

Section: Residual Correlation Learningmentioning

confidence: 99%

“…Instead of designing stacked nonlinear layers to approximate f c (x) by a correlation constraint as [10], we design several stacked nonlinear layers to approximate the residual function �f (x) = f c (x) − f s (x). The process of f s (x) + �f (x) is realized by a shortcut connection and an element-wise addition, so that the residual function is parameterized by residual layers.…”

Section: Fig 1 An Overview Of Our Residual Correlation Network (Rcn)mentioning

confidence: 99%

“…Cross-media retrieval methods based on deep neural networks (DNN) have shown their remarkable performance [9][10][11][12], making use of DNN's powerful abstraction ability to learn the common representations for different media types. An extension of the restricted Boltzmann machine (RBM) is applied by Ngiam et al [9] to get shared representation and bimodal autoencoders (Bimodal AE) is proposed, producing common representation for different media types by a shared code layer.…”

mentioning

confidence: 99%

“…An extension of the restricted Boltzmann machine (RBM) is applied by Ngiam et al [9] to get shared representation and bimodal autoencoders (Bimodal AE) is proposed, producing common representation for different media types by a shared code layer. Feng et al [10] propose a method named correspondence autoencoder (Corr-AE) to model the reconstruction and correlation constraints simultaneously. Peng et al [11] propose the cross-media multiple deep network (CMDN), using hierarchical learning to exploit the complex cross-media correlation.…”

mentioning

confidence: 99%

“…However, previous methods [5,9,10] mostly use inter-media constraints (such as correlation constraints [10]) and intra-media constraints (such as semantic [5] or reconstruction constraints [9,10]) to build common representations for cross-media retrieval. It is challenging to optimize common representation learning since inter-media and intra-media constraints both need to be considered as objective functions [13,14], which restrains the performance of cross-media retrieval.…”

mentioning

confidence: 99%

See 4 more Smart Citations