2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.01046
|View full text |Cite
|
Sign up to set email alerts
|

Mixture-Kernel Graph Attention Network for Situation Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
16
0

Year Published

2020
2020
2025
2025

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(17 citation statements)
references
References 19 publications
1
16
0
Order By: Relevance
“…For ImSitu actions we prompt the model with "What are they doing?". GPV-2 gets 34.7 top-5 accuracy compared to 58.6 from the benchmark authors [87] employing a supervised CNN+CRF approach and 68.6 from a recent supervised model [72] that uses a specialized mixturekernel attention graph neural network. For verbs present in WEB10K (the Seen column), WEB10K training provides a significant boost (54.4 from 33.4) showing successful transfer from web images to ImSitu images.…”
Section: Methodsmentioning
confidence: 99%
“…For ImSitu actions we prompt the model with "What are they doing?". GPV-2 gets 34.7 top-5 accuracy compared to 58.6 from the benchmark authors [87] employing a supervised CNN+CRF approach and 68.6 from a recent supervised model [72] that uses a specialized mixturekernel attention graph neural network. For verbs present in WEB10K (the Seen column), WEB10K training provides a significant boost (54.4 from 33.4) showing successful transfer from web images to ImSitu images.…”
Section: Methodsmentioning
confidence: 99%
“…The common pipeline of SR and GSR in the literature [3,4,19,27,29,32,40,41] resembles the two processes: predicting a verb (Glance), then estimating a noun for each role associated with the predicted verb (Gaze). Regarding this pipeline, correctness of the predicted verb is extremely important since noun estimation entirely depends on the predicted verb.…”
Section: Glance Gazementioning
confidence: 99%
“…Inspired by image captioning task, Mallya and Lazebnik [15] adopt a Recurrent Neural Network (RNN) architecture to model the relations in the predefined order. Li et al [9] use a Gated Graph Neural Network (GGNN) [10] to capture relations among roles, and Suhail and Sigal [20] propose a modified GGNN to learn context-aware relations among roles depending on the content of the image. Cooray et al [2] formulate the relation modeling as an interdependent query based visual reasoning problem.…”
Section: Related Workmentioning
confidence: 99%