Predictive-Corrective Networks for Action Detection

Dave, Achal; Russakovsky, Olga; Ramanan, Deva

doi:10.1109/cvpr.2017.223

Cited by 34 publications

(26 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Single-frame CNN [38] 34.7 Two-stream CNN [37] 36.2 C3D + LinearInterp [34] 37.0 Predictive-corrective [10] 38.9 LSTM [14] 39. [32] are much larger when the input contains sensor data.…”

Section: Methods Mapmentioning

confidence: 99%

Temporal Recurrent Networks for Online Action Detection

Gao

Chen

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

161

216

View full text Add to dashboard Cite

Most work on temporal action detection is formulated as an offline problem, in which the start and end times of actions are determined after the entire video is fully observed. However, important real-time applications including surveillance and driver assistance systems require identifying actions as soon as each video frame arrives, based only on current and historical observations. In this paper, we propose a novel framework, Temporal Recurrent Network (TRN), to model greater temporal context of a video frame by simultaneously performing online action detection and anticipation of the immediate future. At each moment in time, our approach makes use of both accumulated historical evidence and predicted future information to better recognize the action that is currently occurring, and integrates both of these into a unified end-to-end architecture. We evaluate our approach on two popular online action detection datasets, HDD and TVSeries, as well as another widely used dataset, THUMOS'14. The results show that TRN significantly outperforms the state-of-the-art.

show abstract

“…Single-frame CNN [38] 34.7 Two-stream CNN [37] 36.2 C3D + LinearInterp [34] 37.0 Predictive-corrective [10] 38.9 LSTM [14] 39. [32] are much larger when the input contains sensor data.…”

Section: Methods Mapmentioning

confidence: 99%

Temporal Recurrent Networks for Online Action Detection

Gao

Chen

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

161

216

View full text Add to dashboard Cite

show abstract

“…Another line of work generates frame-wise or snippetwise action labels, and uses these labels to define the temporal boundaries of actions [29,38,10,26,55,20]. One major challenge here is to enable temporal contextual reasoning in predicting the individual labels.…”

Section: Related Workmentioning

confidence: 99%

“…One major challenge here is to enable temporal contextual reasoning in predicting the individual labels. Lea et al [26] proposed novel temporal convolutional architectures to capture longrange temporal dependencies, while others [29,38,10] use recurrent neural networks. A few other methods add a separate contextual reasoning stage on top of the frame-wise or snippet-wise prediction scores to explicitly model action durations or temporal transitions [34,55,20].…”

Section: Related Workmentioning

confidence: 99%

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Chao

Vijayanarasimhan

Seybold

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

663

418

View full text Add to dashboard Cite

We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster R-CNN object detection framework. TAL-Net addresses three key shortcomings of existing approaches: (1) we improve receptive field alignment using a multi-scale architecture that can accommodate extreme variation in action durations;(2) we better exploit the temporal context of actions for both proposal generation and action classification by appropriately extending receptive fields; and (3) we explicitly consider multi-stream feature fusion and demonstrate that fusing motion late is important. We achieve state-ofthe-art performance for both action proposal and localization on THUMOS'14 detection benchmark and competitive performance on ActivityNet challenge.

show abstract

“…Sigurdsson et al [27] used a fully-connected temporal CRF model for reasoning over various aspects of activities. Dave et al [10] employed RNNs to sequentially make top-down predictions and later then corrected them by bottom-up observations. Most recently, Sigurdsson et al [28] performed a detailed analysis on what kinds of information are needed to achieve substantial gains for activity understanding among objects, verbs, intent, and sequential reasoning.…”

Section: Related Workmentioning

confidence: 99%

“…Comparison with state-of-the-art. On Charades, we compare with several state-of-the-art methods such as [29,35,38,13,27,15,10]. Our results are shown in Table 6 Table 7.…”

Section: Multi-label Action Recognitionmentioning

confidence: 99%

TAN: Temporal Aggregation Network for Dense Multi-Label Action Recognition

Dai

Singh

et al. 2019

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

We present Temporal Aggregation Network (TAN) which decomposes 3D convolutions into spatial and temporal aggregation blocks. By stacking spatial and temporal convolutions repeatedly, TAN forms a deep hierarchical representation for capturing spatio-temporal information in videos. Since we do not apply 3D convolutions in each layer but only apply temporal aggregation blocks once after each spatial downsampling layer in the network, we significantly reduce the model complexity. The use of dilated convolutions at different resolutions of the network helps in aggregating multi-scale spatio-temporal information efficiently. Experiments show that our model is well suited for dense multi-label action recognition, which is a challenging subtopic of action recognition that requires predicting multiple action labels in each frame. We outperform state-of-theart methods by 5% and 3% on the Charades and Multi-THUMOS dataset respectively.

show abstract

Predictive-Corrective Networks for Action Detection

Cited by 34 publications

References 44 publications

Temporal Recurrent Networks for Online Action Detection

Temporal Recurrent Networks for Online Action Detection

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

TAN: Temporal Aggregation Network for Dense Multi-Label Action Recognition

Contact Info

Product

Resources

About