stagNet: An Attentive Semantic RNN for Group Activity Recognition

Qi, Mingjing; Qin, Jie; Li, Annan; Wang, Yunhong; Luo, Jiebo; Gool, Luc Van

doi:10.1007/978-3-030-01249-6_7

Cited by 117 publications

(97 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Backbone Group Individual activity action HDTM [24] AlexNet 81.9% -CERN [45] VGG16 83.3% -stagNet (GT) [39] VGG16 89.3% -stagNet (PRO) [39] VGG16 87.6% -HRN [23] VGG19 89.5% -SSU (GT) [3] Inception-v3 90.6% 81.8% SSU (PRO) [3] Inception…”

Section: Methodsmentioning

confidence: 99%

“…Backbone Group activity SIM [12] AlexNet 81.2% HDTM [24] AlexNet 81.5% Cardinality Kernel [17] None 83.4% SBGAR [32] Inception-v3 86.1% CERN [45] VGG16 87.2% stagNet (GT) [39] VGG16 89.1% stagNet (PRO) [ [3], and outperforms it by about 2% on group activity recognition accuracy, since our model can capture and exploit the relation information among actors. And, we also achieve better performance on individual action recognition task.…”

Section: Methodsmentioning

confidence: 99%

“…The earlier approaches are mostly based on a combination of hand-crafted visual features with probability graphical models [1,31,30,43,6,8,17] or AND-OR grammar models [2,46]. Recently, the wide adoption of deep convolutional neural networks (CNNs) has demonstrated significant performance improvements on group activity recognition [3,24,41,45,12,32,59,23,39]. Ibrahim et al [24] designed a two-stage deep temporal model, which builds a LSTM model to represent action dynamics of individual people and another LSTM model to aggregate personlevel information.…”

Section: Related Workmentioning

confidence: 99%

“…Ibrahim et al [23] proposed a hierarchical relational network that builds a relational representation for each person. There are also efforts that explore modeling the scene context via structured recurrent neural networks [12,59,39] or generating captions [32]. Our work differs from these approaches in that it explicitly models the interactions information via building flexible and interpretable ARG.…”

Section: Related Workmentioning

confidence: 99%

“…Recent deep learning methods have shown promising results for group activity recognition in videos [3,24,45,12,32,59,23,39]. Typically, these methods follow a two-stage recognition pipeline.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Learning Actor Relation Graphs for Group Activity Recognition

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

241

188

View full text Add to dashboard Cite

Modeling relation between actors is important for recognizing group activity in a multi-person scene. This paper aims at learning discriminative relation between actors efficiently using deep models. To this end, we propose to build a flexible and efficient Actor Relation Graph (ARG) to simultaneously capture the appearance and position relation between actors. Thanks to the Graph Convolutional Network, the connections in ARG could be automatically learned from group activity videos in an end-toend manner, and the inference on ARG could be efficiently performed with standard matrix operations. Furthermore, in practice, we come up with two variants to sparsify ARG for more effective modeling in videos: spatially localized ARG and temporal randomized ARG. We perform extensive experiments on two standard group activity recognition datasets: the Volleyball dataset and the Collective Activity dataset, where state-of-the-art performance is achieved on both datasets. We also visualize the learned actor graphs and relation features, which demonstrate that the proposed ARG is able to capture the discriminative relation information for group activity recognition. 1

show abstract

Section: Methodsmentioning

confidence: 99%