2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00629
|View full text |Cite
|
Sign up to set email alerts
|

DynamoNet: Dynamic Action and Motion Network

Abstract: In this paper, we are interested in self-supervised learning the motion cues in videos using dynamic motion filters for a better motion representation to finally boost human action recognition in particular. Thus far, the vision community has focused on spatio-temporal approaches using standard filters, rather we here propose dynamic filters that adaptively learn the video-specific internal motion representation by predicting the short-term future frames. We name this new motion representation, as dynamic moti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
53
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 115 publications
(54 citation statements)
references
References 49 publications
1
53
0
Order By: Relevance
“…The superior results in Table 4 show that we achieve the comparable performances: 75.7% on the top-1 accuracy and 93.8% on the top-5 accuracy in validation set, which outperforms our baseline ECO-Lite-EN [17] by 5.7% on top-1 accuracy and by 4.4% on top-5 accuracy. It also outperforms the recent works STM [31] and DynamoNet-32F (ResNext101) [29] by 2.0% and 7.5% on top-1 accuracy and by 2.2% and 5.7% on top-5 accuracy, respectively.…”
Section: ) Results On Kinetics-400mentioning
confidence: 62%
See 2 more Smart Citations
“…The superior results in Table 4 show that we achieve the comparable performances: 75.7% on the top-1 accuracy and 93.8% on the top-5 accuracy in validation set, which outperforms our baseline ECO-Lite-EN [17] by 5.7% on top-1 accuracy and by 4.4% on top-5 accuracy. It also outperforms the recent works STM [31] and DynamoNet-32F (ResNext101) [29] by 2.0% and 7.5% on top-1 accuracy and by 2.2% and 5.7% on top-5 accuracy, respectively.…”
Section: ) Results On Kinetics-400mentioning
confidence: 62%
“…Specifically, our SAST-EN significantly outperforms our baseline ECO-Lite-EN [17] by 1.6% on UCF101 and by 2.7% on HMDB51. It also outperforms the recent works STM [31], DistInit [28] and DynamoNet-32F (ResNext101) [29] by 0.2%, 10.6% and 3.3% on UCF101, and by 2.9%, 20.3% and 6.6% on HMDB51, respectively. Note that SAST-EN represents the average scores obtained from an ensemble of SAST network with the {16, 20, 24, 32} number of input frames similar to ECO-Lite-EN [17].…”
Section: Performance Comparison 1) Results On Ucf101 and Hmdb51mentioning
confidence: 66%
See 1 more Smart Citation
“…Condconv [30] improves the model capacity by increasing the size and complexity of the kernel-generating function. Due to their advantages, dynamic filter networks have been applied in many areas, like human action recognition [7], super-resolution [29].…”
Section: Dynamic Filter Networkmentioning
confidence: 99%
“…Recently, there were few works, which focused on exploiting the temporal information via concatenation of multiple frames at input, such as Sun et al (2015) and Diba et al (2019). The problem of these approaches lies in inability to scale well on long sequences.…”
Section: Exploiting Previous Frames Informationmentioning
confidence: 99%