2022 IEEE International Conference on Multimedia and Expo (ICME) 2022
DOI: 10.1109/icme52920.2022.9859766
|View full text |Cite
|
Sign up to set email alerts
|

Joint Learning of Object Graph and Relation Graph for Visual Question Answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(4 citation statements)
references
References 6 publications
0
4
0
Order By: Relevance
“…Exploring further, Graphhopper [56] tackles the complexity of multi-hop reasoning over complex visual scenes to deduce reasoning paths leading to answers in VQA. The Dual Message-passing enhanced GNN (DM-GNN) [63] encodes multi-scale scene graph information into two distinct graphs focusing on objects and relations. This dual structure achieves a balanced representation of object, relation, and attribute features in VQA.…”
Section: Visual Question Answeringmentioning
confidence: 99%
“…Exploring further, Graphhopper [56] tackles the complexity of multi-hop reasoning over complex visual scenes to deduce reasoning paths leading to answers in VQA. The Dual Message-passing enhanced GNN (DM-GNN) [63] encodes multi-scale scene graph information into two distinct graphs focusing on objects and relations. This dual structure achieves a balanced representation of object, relation, and attribute features in VQA.…”
Section: Visual Question Answeringmentioning
confidence: 99%
“…ReGAT [11] further introduces the semantic position edge relations into the spatial graph, which trains a classifier to predict the spatial relationships between visual objects into 11 specific categories, for example, ‘cover’, ‘inter‐sect’ and ‘no‐relation’ for too far pairs. DM‐GNN [12] simultaneously introduces the relative position and human behaviour information from the image by utilising the SGG model to construct the visual relationship scene graph. KG‐Aug [36] uses the entities of visual objects and key question words to retrieve the corresponding triplets in the ConceptNet, and combines these triplets to construct the knowledge scene graph.…”
Section: Related Workmentioning
confidence: 99%
“…Here, the spatial scene graph reliably assists in solving the position information problems, but it fails to introduce the relevant semantic relationships between objects. In addition, DM‐GNN [12] utilises the existing Scene Graph Generation [10] (SGG) model to construct a visual relationship based scene graph for one image, which includes the position relationships and semantic relationships about human behaviour. Here, SGG is pre‐trained on the large visual relationship benchmark (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…Graph representation learning intends to transform nodes and links on the graph into lower-dimensional vector embeddings, which can be quite challenging due to the complex graph topological structures and node/link attributes. While approaches on static graphs have made breakthroughs and demonstrated distinguishable applicability in various fields (Graepel et al 2010;He et al 2014;Li et al 2022;Zhu et al 2022), those on temporal graphs are just getting started. Modeling a temporal graph (which may evolve over time with the addition, deletion, and changing of its attributes) is a core problem in developing real-world industrial systems (e.g., social network, citation network, recommendation systems) where many data are time-dependent, and is Figure 1: An example of temporal graph modeling.…”
Section: Introductionmentioning
confidence: 99%