Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3416276
|View full text |Cite
|
Sign up to set email alerts
|

Person-level Action Recognition in Complex Events via TSD-TSM Networks

Abstract: The task of person-level action recognition in complex events aims to densely detect pedestrians and individually predict their actions from surveillance videos. In this paper, we present a simple yet efficient pipeline for this task, referred to as TSD-TSM networks. Firstly, we adopt the TSD detector for the pedestrian localization on each single keyframe. Secondly, we generate the sequential ROIs for a person proposal by replicating the adjusted bounding box coordinates around the keyframe. Particularly, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…Convolutional Neural Networks for video understanding have been extensively studied and widely applied to video-text pre-training [36], cross-modal analysis [29], video detection [19], ecommerce [6][7][8], adversarial attack [4,[53][54][55], interactive search [37,58], retrieval [17,18,26,57,62,64,66], hyperlinking [9,20,21,39], and caption [2,47,61], in the CNN era; we select and review representative 3D-CNNs as follows. C3D [48] is a pure 3D-CNN pilot based on a new 3D Conv operator and easily outperforms 2D counterparts on video tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Convolutional Neural Networks for video understanding have been extensively studied and widely applied to video-text pre-training [36], cross-modal analysis [29], video detection [19], ecommerce [6][7][8], adversarial attack [4,[53][54][55], interactive search [37,58], retrieval [17,18,26,57,62,64,66], hyperlinking [9,20,21,39], and caption [2,47,61], in the CNN era; we select and review representative 3D-CNNs as follows. C3D [48] is a pure 3D-CNN pilot based on a new 3D Conv operator and easily outperforms 2D counterparts on video tasks.…”
Section: Related Workmentioning
confidence: 99%
“…The deep neural networks have achieved considerable success in the scenario where the training and testing data are sampled from an identical distribution [2,13]. However, in many real applications, the assumption cannot be satisfied due to the existence of unknowns.…”
Section: Introductionmentioning
confidence: 99%