2022
DOI: 10.48550/arxiv.2205.06530
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Modeling Semantic Composition with Syntactic Hypergraph for Video Question Answering

Abstract: A key challenge in video question answering is how to realize the cross-modal semantic alignment between textual concepts and corresponding visual objects. Existing methods mostly seek to align the word representations with the video regions. However, word representations are often not able to convey a complete description of textual concepts, which are in general described by the compositions of certain words. To address this issue, we propose to first build a syntactic dependency tree for each question with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 18 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?