2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.599
|View full text |Cite
|
Sign up to set email alerts
|

Asynchronous Temporal Fields for Action Recognition

Abstract: Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as intentions. But how do we model and reason about these? We propose a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
76
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 138 publications
(77 citation statements)
references
References 59 publications
1
76
0
Order By: Relevance
“…60 DAP [13] 134.1 Single-stream R-C3D (Titan X Maxwell) [32] 569 Single-stream R-C3D (Titan X Pascal) [32] 1030 Two-stream R-C3D (Concat) (Titan X Pascal) * 656 Two-stream R-C3D (Sum) (Titan X Pascal) * 642 Single-stream R-C3D (OHEM) (Titan X Pascal) 1030 aged across 20 frames leading to more spatial consistency. As shown in Table VIII, our jointly optimized single stream model outperforms the asynchronous temporal fields model [61] as well as several baselines reported in the same paper [61]. While the improvement over the standard method is as high as 2.8%, the improvement after the post-processing is not as high.…”
Section: Fps S-cnn [3]mentioning
confidence: 59%
See 1 more Smart Citation
“…60 DAP [13] 134.1 Single-stream R-C3D (Titan X Maxwell) [32] 569 Single-stream R-C3D (Titan X Pascal) [32] 1030 Two-stream R-C3D (Concat) (Titan X Pascal) * 656 Two-stream R-C3D (Sum) (Titan X Pascal) * 642 Single-stream R-C3D (OHEM) (Titan X Pascal) 1030 aged across 20 frames leading to more spatial consistency. As shown in Table VIII, our jointly optimized single stream model outperforms the asynchronous temporal fields model [61] as well as several baselines reported in the same paper [61]. While the improvement over the standard method is as high as 2.8%, the improvement after the post-processing is not as high.…”
Section: Fps S-cnn [3]mentioning
confidence: 59%
“…Table VIII provides a comparative evaluation with various baseline models reported in [61]. This approach [61] trains a CRF based video classification model (asynchronous temporal fields) and evaluates the prediction performance on 25 equidistant frames by making a multi-label prediction for each frame.…”
Section: Experiments On Charadesmentioning
confidence: 99%
“…For example, on HMDB-51, our results are about 2-3% better than the next best method without IDT-FV. On Charades, we outperform previous methods by about 3% while faring well on the detection task against [81]. We also demonstrate significant performance (about 3-4%) improvement on NTU-RGBD and marginally better performance on MSR datasets on skeleton-based action recognition.…”
Section: Comparisons To the State Of The Artmentioning
confidence: 66%
“…Following the evaluation protocol in [25], we use the output probability of the classifier to be the score of the sequence. In the detection task, we consider the evaluation method with post-processing proposed in [81], which uses the averaged prediction score of a temporal window around each temporal pivots. Instead of average pooling, we apply the SVMP.…”
Section: Action Recognition/detection In Untrimmed Videosmentioning
confidence: 99%
“…• Asyn-TF (CVPR 2017) [27]: The model attempts to reason over various aspects of activity that includes objects, actions, and intentions.…”
Section: Compared Methodsmentioning
confidence: 99%