2020
DOI: 10.48550/arxiv.2004.07485
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Asynchronous Interaction Aggregation for Action Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…We employ the Asynchronous Interaction Aggregation (AIA) network [25] as the main action detection model for this task. The backbone of the model is SlowFast 8×8 ResNet-101 (SlowFast8x8-R101) [6] which is pretrained on the Kinetics-700 dataset [2].…”
Section: Action Detection Modelmentioning
confidence: 99%
See 3 more Smart Citations
“…We employ the Asynchronous Interaction Aggregation (AIA) network [25] as the main action detection model for this task. The backbone of the model is SlowFast 8×8 ResNet-101 (SlowFast8x8-R101) [6] which is pretrained on the Kinetics-700 dataset [2].…”
Section: Action Detection Modelmentioning
confidence: 99%
“…The whole AIA model (with dense serial Interaction Aggregation) is then trained on the AVA dataset [9] which is a large-scale spatiotemporal action localization dataset. We take the AVA trained model open-sourced by the authors of AIA [25] as our base model. As the actions in HIE are all human-centric and have no interactions with other objects, unlike the actions in AVA, we remove the personobject interactions modules in the AIA.…”
Section: Action Detection Modelmentioning
confidence: 99%
See 2 more Smart Citations
“…In sequential problems of NLP (Bahdanau, Cho, and Bengio 2014;Vaswani et al 2017;Lin et al 2017b;Xu et al 2015), attention mechanisms are widely adopted in recurrent neural networks (RNN) (Pang et al 2019), Long Short Term Memory (LSTM) (Hochreiter and Schmidhuber 1997), SCS (Pang et al 2020b), and Transformer (Vaswani et al 2017) to capture the relationships between words or sentences. In computer vision, many tasks like fine-grained recognition (Fu, Zheng, and Mei 2017;Wang et al 2015;Fang et al 2018;Pang et al 2020c), image captioning (Anderson et al 2018;Anne Hendricks et al 2016;Xu et al 2015), classification (Mnih et al 2014;Hu, Shen, and Sun 2018;Woo et al 2018;Wang et al 2017;Tang et al 2020), and segmentation (Ren and Zemel 2017;Chen et al 2016;Cao et al 2020) also utilize attention mechanisms based on soft attention maps or boundingboxes to search salient areas. Moreover, self-attention structures (Wang et al 2018;Zhu et al 2019;Huang et al 2018;Dai et al 2019) focusing on the combination weight of elements (pixels in vision) are another attention method that adopts adjacent matrix to present attentions.…”
Section: Related Workmentioning
confidence: 99%