2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01021
|View full text |Cite
|
Sign up to set email alerts
|

A Structured Model for Action Detection

Abstract: A dominant paradigm for learning-based approaches in computer vision is training generic models, such as ResNet for image recognition, or I3D for video understanding, on large datasets and allowing them to discover the optimal representation for the problem at hand. While this is an obviously attractive approach, it is not applicable in all scenarios. We claim that action detection is one such challenging problem -the models that need to be trained are large, and the labeled data is expensive to obtain. To add… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 89 publications
(48 citation statements)
references
References 51 publications
0
44
0
Order By: Relevance
“…People and objects interacting with people should be the focus of activity recognition. Zhang et al [21] [28] object detection algorithms in deep learning show good performance. Overall, the two-stage object detection framework is more accurate, but the one-stage YOLO has faster, even real-time inference speeds with guaranteed accuracy, making it well suited for engineering practice.…”
Section: Object Detectionmentioning
confidence: 99%
“…People and objects interacting with people should be the focus of activity recognition. Zhang et al [21] [28] object detection algorithms in deep learning show good performance. Overall, the two-stage object detection framework is more accurate, but the one-stage YOLO has faster, even real-time inference speeds with guaranteed accuracy, making it well suited for engineering practice.…”
Section: Object Detectionmentioning
confidence: 99%
“…While action recognition considers the classification task, action detection carries out both classification and detection of actions. Although action detection is usually addressed using full supervision [7,9,23,25,30,33], we are interested in weak supervision, which allows us to reduce annotation cost by training models using very few action bounding box annotations per action instance.…”
Section: Action Recognition and Detectionmentioning
confidence: 99%
“…Recently, GCN have been used for visual relational reasoning for the tasks of action recognition [29] and group activity recognition [32]. Zhang et al [33] employ GCN [12] for action detection, where nodes represent detected actors and objects. As we do not require an external object detector, our approach is suitable to reason with respect to arbitrary context and objects which cannot be detected, e.g.…”
Section: Visual Relational Reasoningmentioning
confidence: 99%
See 1 more Smart Citation
“…The recent surge of new methods in this context include various methods based on graph-based representations of videos and actions [87,161] and deep neural networks [8,82,219]. Most of those methods achieve impressive results for a series of challenging tasks, such as action detection [240,241], human-object interaction understanding [64,126,161,196,224] and relational reasoning in videos [8,244], action similarity assessment [162,215] and skill determination [52,53,120].…”
Section: Relies On Editing and Analyzing Visual Datamentioning
confidence: 99%