2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.223
|View full text |Cite
|
Sign up to set email alerts
|

Predictive-Corrective Networks for Action Detection

Abstract: While deep feature learning has revolutionized techniques for static-image understanding, the same does not quite hold for video processing. Architectures and optimization techniques used for video are largely based off those for static images, potentially underutilizing rich video information. In this work, we rethink both the underlying network architecture and the stochastic learning paradigm for temporal data. To do so, we draw inspiration from classic theory on linear dynamic systems for modeling time ser… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(26 citation statements)
references
References 44 publications
0
26
0
Order By: Relevance
“…Single-frame CNN [38] 34.7 Two-stream CNN [37] 36.2 C3D + LinearInterp [34] 37.0 Predictive-corrective [10] 38.9 LSTM [14] 39. [32] are much larger when the input contains sensor data.…”
Section: Methods Mapmentioning
confidence: 99%
“…Single-frame CNN [38] 34.7 Two-stream CNN [37] 36.2 C3D + LinearInterp [34] 37.0 Predictive-corrective [10] 38.9 LSTM [14] 39. [32] are much larger when the input contains sensor data.…”
Section: Methods Mapmentioning
confidence: 99%
“…Another line of work generates frame-wise or snippetwise action labels, and uses these labels to define the temporal boundaries of actions [29,38,10,26,55,20]. One major challenge here is to enable temporal contextual reasoning in predicting the individual labels.…”
Section: Related Workmentioning
confidence: 99%
“…One major challenge here is to enable temporal contextual reasoning in predicting the individual labels. Lea et al [26] proposed novel temporal convolutional architectures to capture longrange temporal dependencies, while others [29,38,10] use recurrent neural networks. A few other methods add a separate contextual reasoning stage on top of the frame-wise or snippet-wise prediction scores to explicitly model action durations or temporal transitions [34,55,20].…”
Section: Related Workmentioning
confidence: 99%
“…Sigurdsson et al [27] used a fully-connected temporal CRF model for reasoning over various aspects of activities. Dave et al [10] employed RNNs to sequentially make top-down predictions and later then corrected them by bottom-up observations. Most recently, Sigurdsson et al [28] performed a detailed analysis on what kinds of information are needed to achieve substantial gains for activity understanding among objects, verbs, intent, and sequential reasoning.…”
Section: Related Workmentioning
confidence: 99%
“…Comparison with state-of-the-art. On Charades, we compare with several state-of-the-art methods such as [29,35,38,13,27,15,10]. Our results are shown in Table 6 Table 7.…”
Section: Multi-label Action Recognitionmentioning
confidence: 99%