2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01940
|View full text |Cite
|
Sign up to set email alerts
|

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(16 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…In order to improve attention efficiency, Ref. [40] proposes a novel directed-attention mechanism to understand human actions in exact order. A trajectory attention block (trajectory attention block) is proposed [41] to enhance the robustness of human action recognition in dynamic scenes which generates a set of specific trajectory markers along the spatial dimension and performs pooling operations along the temporal dimension.…”
Section: Attention Mechanism For Video Understandingmentioning
confidence: 99%
“…In order to improve attention efficiency, Ref. [40] proposes a novel directed-attention mechanism to understand human actions in exact order. A trajectory attention block (trajectory attention block) is proposed [41] to enhance the robustness of human action recognition in dynamic scenes which generates a set of specific trajectory markers along the spatial dimension and performs pooling operations along the temporal dimension.…”
Section: Attention Mechanism For Video Understandingmentioning
confidence: 99%
“…Truong et al propose an end-to-end transformer structure, Direcformer [71]. The structure introduces ordinal time learning into the transformer, which helps to understand the chronological order of actions.…”
Section: Transformer Modelsmentioning
confidence: 99%
“…This initial recommendation set is pruned and temporarily extended using optical flow and transudative learning. 58.5% 65.7% ACT [71] 65.7% 69.5% Faster-RCNN + two-stream I3Dconv [72] 73.3% 76.3% YOWO (16-frame) [73] 74.4% 87.2% HIT [116] 83.8% 84.8%…”
Section: Spatiotemporal Action Detectionmentioning
confidence: 99%
“…As Vision Transformers brought recent breakthroughs in computer vision, specifically for action recognition tasks, many researchers have adopted them as their model [ 37 , 38 , 39 , 40 ] or combined them with 2D CNN [ 41 ]. For example, Arnab et al [ 37 ] proposed several factorization variants to model spatial and temporal representation effectively inside a transformer encoder.…”
Section: Related Workmentioning
confidence: 99%