2023
DOI: 10.1109/tcsvt.2023.3250646
|View full text |Cite
|
Sign up to set email alerts
|

Spatio-Temporal Adaptive Network With Bidirectional Temporal Difference for Action Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(4 citation statements)
references
References 53 publications
0
4
0
Order By: Relevance
“…However, as discussed in [22], the spatial information and the temporal information should be learned by different cognitive mechanisms, and should be synchronized to process the sequential information. The conventional approaches [41], [45], [46] that first extract spatial information from each frame and then fuse it with temporal transformers may lead to a lack of information synchronization. Besides, in group detection, the temporal information e.g., trajectories, is not just a sequence, it is also the important location information for distinguishing the group results.…”
Section: Spatio-temporal Methodsmentioning
confidence: 99%
“…However, as discussed in [22], the spatial information and the temporal information should be learned by different cognitive mechanisms, and should be synchronized to process the sequential information. The conventional approaches [41], [45], [46] that first extract spatial information from each frame and then fuse it with temporal transformers may lead to a lack of information synchronization. Besides, in group detection, the temporal information e.g., trajectories, is not just a sequence, it is also the important location information for distinguishing the group results.…”
Section: Spatio-temporal Methodsmentioning
confidence: 99%
“…The comparison with the state-of-the-art methods on the HMDB51 dataset is shown in Table 6, and we report the mean class accuracy over three splits on the HMDB51 dataset. There are three parts in this table: single modality methods [7], [8], [34], [35], [36], multi-stream fusion methods [8], [37], [38], [39], and the proposed methods. Our FCKD(F) achieves an average accuracy of 81.0% on three common splits, outperforming the baseline 3D ResNeXt101 by 7.1%.…”
Section: E Comparison With State-of-the-artsmentioning
confidence: 99%
“…Our FCKD(F) achieves an average accuracy of 81.0% on three common splits, outperforming the baseline 3D ResNeXt101 by 7.1%. Compared with TSM [34], TCM-R50 [35] and STANet [36] that only use RGB modality for temporal modeling, our FCKD(F+S+D) have learned motion knowledge from optical flow and outperforming them by 7.5%, 3.5% and 3.3%, respectively. In addition, the state-of-the-art knowledge distillation methods Mars [8] and D3D [7] also achieve better results than TSM, TCM-R50 and STANet.…”
Section: E Comparison With State-of-the-artsmentioning
confidence: 99%
“…TSM [9] innovatively introduces temporal information into 2D CNNs by moving channels across the time dimension, without additional computational overhead. Subsequent studies [10][11][12][13][14][15] integrate various forms of temporal modeling into 2D CNNs with distinct approaches. MVFNet utilizes separable convolutions for dynamic video features but overlooks the correlation among multiscale spatiotemporal features.…”
Section: Introductionmentioning
confidence: 99%