2020
DOI: 10.1007/978-3-030-58586-0_2
|View full text |Cite
|
Sign up to set email alerts
|

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
83
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 130 publications
(83 citation statements)
references
References 36 publications
0
83
0
Order By: Relevance
“…In contrast to previous studies, SMFEA models the relation structure of intra-modal fragments/words by the use of a fixed contextual structure and aligns two modalities into a joint embedding space in terms of semantics and structure. The most relevant existing work to ours is [30], which aligns the visual and textual representations through measuring the consistency of the corresponding concepts in each modality. However unlike [30], SMFEA approaches this in a novel way by exploiting the learned multi-modal semantic trees to…”
Section: Structured Feature Embeddingmentioning
confidence: 99%
See 2 more Smart Citations
“…In contrast to previous studies, SMFEA models the relation structure of intra-modal fragments/words by the use of a fixed contextual structure and aligns two modalities into a joint embedding space in terms of semantics and structure. The most relevant existing work to ours is [30], which aligns the visual and textual representations through measuring the consistency of the corresponding concepts in each modality. However unlike [30], SMFEA approaches this in a novel way by exploiting the learned multi-modal semantic trees to…”
Section: Structured Feature Embeddingmentioning
confidence: 99%
“…The most relevant existing work to ours is [30], which aligns the visual and textual representations through measuring the consistency of the corresponding concepts in each modality. However unlike [30], SMFEA approaches this in a novel way by exploiting the learned multi-modal semantic trees to…”
Section: Structured Feature Embeddingmentioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, He et al [3] created a new fine-grained cross-media dataset, which contains four modal information (image, text, video, and audio). The collection of cross-modal datasets is more difficult, researchers usually analyze retrieval tasks in two modalities [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. Zhang et al [4] proposed an aggregating the global context to mine the semantic information of image modality and text modality.…”
Section: Introductionmentioning
confidence: 99%
“…Wei et al [8] proposed the universal weighting metric learning method. In addition, some researchers have explored cross-modal retrieval algorithms from attention mechanism, graph reasoning, and loss function optimization [9][10][11][12][13][14][15][16][17][18][19][20]. These studies promote the development of cross-modal retrieval tasks.…”
Section: Introductionmentioning
confidence: 99%