2021
DOI: 10.1109/tip.2021.3076556
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Spatio-Temporal Graph Enhanced Vision-Language Representation for Video QA

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 25 publications
(9 citation statements)
references
References 45 publications
0
9
0
Order By: Relevance
“…Table 7 shows the comparison results on the MSRVTT-QA dataset. It can be observed that our CMCIR performs better than the best performing method ASTG [103], with the highest accuracy of 38.9%. For What, Who, and When question types, the CMCIR performs the best compared with all the previous state-of-the-art methods.…”
Section: Results On Other Benchmark Datasetsmentioning
confidence: 91%
See 1 more Smart Citation
“…Table 7 shows the comparison results on the MSRVTT-QA dataset. It can be observed that our CMCIR performs better than the best performing method ASTG [103], with the highest accuracy of 38.9%. For What, Who, and When question types, the CMCIR performs the best compared with all the previous state-of-the-art methods.…”
Section: Results On Other Benchmark Datasetsmentioning
confidence: 91%
“…• ASTG [103]: A model that builds spatio-temporal relational graph to adaptively refine the temporal connections for dynamic object representations.…”
Section: Results On Other Benchmark Datasetsmentioning
confidence: 99%
“…[2] The aforementioned process mainly utilizes verbal and visual data and is vital for integrating them ingeniously. Currently, three typical visual-verbal integration tasks are: captioning, [3,4] visual question answering (VQA), [5][6][7][8][9] and visual dialog. [10] Image captioning is to generate descriptive text based on visual information.…”
Section: Introductionmentioning
confidence: 99%
“…information in both spatial and temporal spaces has already become a consensus in recent studies, [6][7][8]11] but attention seems not well explored because it degrades the performance, which is reported in ref. [6].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation