Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/720
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge Aware Semantic Concept Expansion for Image-Text Matching

Abstract: Image-text matching is a vital cross-modality task in artificial intelligence and has attracted increasing attention in recent years. Existing works have shown that learning semantic concepts is useful to enhance image representation and can significantly improve the performance of both image-to-text and text-to-image retrieval. However, existing models simply detect semantic concepts from a given image, which are less likely to deal with long-tail and occlusion concepts. Frequently co-occurred concepts in the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 63 publications
(46 citation statements)
references
References 12 publications
0
46
0
Order By: Relevance
“…Many algorithms were invented to generate scene graphs from images [18,19,20,21]since this structure and the dataset about it was public. By virtue of them, some studies about image-text retrieval [15,16] have followed the scene graph approach recently. Shi et al [16] created a scene concept graph based on a popular scene graph dataset [22] and used it to expand the detected concepts in images to extract visual features.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Many algorithms were invented to generate scene graphs from images [18,19,20,21]since this structure and the dataset about it was public. By virtue of them, some studies about image-text retrieval [15,16] have followed the scene graph approach recently. Shi et al [16] created a scene concept graph based on a popular scene graph dataset [22] and used it to expand the detected concepts in images to extract visual features.…”
Section: Related Workmentioning
confidence: 99%
“…In the image-text retrieval field, a node and an edge represent objects and the associations among objects detected in images or captions, as depicted in Figure 1 where the green rectangles are nodes in the graph indicating detected objects and the orange rectangles are edges illustrating the relationships between them. Recent research [15,16] could obtain better results in this retrieval area by utilizing these relations information. Hence there is a promise when applying this useful data structure in the retrieval field.…”
Section: Introductionmentioning
confidence: 99%
“…Context information plays a pivotal role in understanding a sentence for many natural language processing tasks, such as neural machine translation [4,32], text summarization [29], and question answering [27]. Analogously, visual contextual relationships can contribute to obtaining fine-grained image region representations, which would benefit various tasks including image captioning [41], VQA [7], and image-text matching [17,31,35,40]. To exploit visual and textual context and capture implicit relations among intra-modal fragments, researchers have presented some structured models for different multi-modal tasks.…”
Section: Intra-modal Context Modelingmentioning
confidence: 99%
“…In the field of image-text matching, Li et al [17] performed local-global semantic reasoning by using Graph Convolutional Network (GCN) and Gated Recurrent Unit. For learning comprehensive representations, Wang et al [35] and Shi et al [31] refined visual relationships by leveraging external scene graphs [13]. Wu et al [40] considered fragment relations in images and texts to obtain self-attention embeddings, acquiring promising intra-modal context modeling.…”
Section: Intra-modal Context Modelingmentioning
confidence: 99%
See 1 more Smart Citation