2018
DOI: 10.1016/j.ins.2017.08.026
|View full text |Cite
|
Sign up to set email alerts
|

Attention driven multi-modal similarity learning

Abstract: To learn a function for measuring similarity or relevance between objects is an important machine learning task, referred to as similarity learning. Conventional methods are usually insufficient for processing complex patterns, while more sophisticated methods produce results supported by parameters and mathematical operations that are hard to interpret. To improve both model robustness and interpretability, we propose a novel attention driven multi-modal algorithm, which learns a distributed similarity score … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(4 citation statements)
references
References 33 publications
0
4
0
Order By: Relevance
“…Existing works [61] have shown that, nowadays, although deep neural networks are gradually replacing hand-crafted feature extraction, they do not encourage the use of expert knowledge, for instance, provided by existing feature extraction methods, to enhance the learning. To improve this, we have previously proposed an unsupervised multi-view training algorithm in [22], [62]. It pre-trains a CNN to preserve knowledge offered by multiple image feature extraction methods that characterize heterogeneous properties of the image content.…”
Section: B Model Constructionmentioning
confidence: 99%
See 1 more Smart Citation
“…Existing works [61] have shown that, nowadays, although deep neural networks are gradually replacing hand-crafted feature extraction, they do not encourage the use of expert knowledge, for instance, provided by existing feature extraction methods, to enhance the learning. To improve this, we have previously proposed an unsupervised multi-view training algorithm in [22], [62]. It pre-trains a CNN to preserve knowledge offered by multiple image feature extraction methods that characterize heterogeneous properties of the image content.…”
Section: B Model Constructionmentioning
confidence: 99%
“…Therefore, these images are deemed to be connected under the different modalities of "juice", "tree" and "company". To accommodate this phenomenon, various multimodal similarity learning algorithms have been developed, e.g., by using different kernel functions [19], base metrics [20], transformation functions [21] or distributed relation measures across multiple dimensions [22] in order to model such diverse relation modalities. A brief review on this is provided in Section II-B.…”
Section: Introductionmentioning
confidence: 99%
“…Along the same direction, the self-attention GAN was constructed for the generation of natural images, where the self-attention mechanism was incorporated into both generator and discriminator of a convolutional GAN. Attention was used in other applications, such as similarity learning (Gao, 2018) and hand gesture recognition (Li 2018).…”
Section: Accepted Manuscriptmentioning
confidence: 99%
“…In most recent years, long short-term memory (LSTM) network (Bengio et al, 1994;Hochreiter and Schmidhuber, 1997), BiLSTM (Pennington et al, 2014a) and gated recurrent unit (GRU) (Cho et al, 2014) are widely used to get sentence representative vector, and achieved better result compared with traditional methods. Attention model also known as alignment model pays more attention to two sentences interaction (Zheng et al, 2018;Gao et al, 2018), which is usually applied in information extraction, relation extraction, text summarization and machine translation. In machine translation, the attention model can be focused on one or a few words of input to make the translation more accurate when generating each new word.…”
Section: Introductionmentioning
confidence: 99%