2023
DOI: 10.48550/arxiv.2302.01921
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transformers in Action Recognition: A Review on Temporal Modeling

Abstract: In vision-based action recognition, spatio-temporal features from different modalities are used for recognizing activities. Temporal modeling is a long challenge of action recognition. However, there are limited methods such as pre-computed motion features, three-dimensional (3D) filters, and recurrent neural networks (RNN) for modeling motion information in deep-based approaches. Recently, transformers' success in modeling long-range dependencies in natural language processing (NLP) tasks has gotten great att… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 248 publications
(320 reference statements)
0
3
0
Order By: Relevance
“…Moreover, the effects of different spatio-temporal convolutions have been investigated and 3D CNNs outperformed 2D CNNs in the concept of residual learning [36]. The use of 3D filters in combination with other techniques, such as motion-based features and transformers, has improved the accuracy and efficiency of action recognition [37]- [41]. 3D CNNs have the ability to extract distinctive features in spatial and temporal dimensions, requiring simultaneous processing of both types of features.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, the effects of different spatio-temporal convolutions have been investigated and 3D CNNs outperformed 2D CNNs in the concept of residual learning [36]. The use of 3D filters in combination with other techniques, such as motion-based features and transformers, has improved the accuracy and efficiency of action recognition [37]- [41]. 3D CNNs have the ability to extract distinctive features in spatial and temporal dimensions, requiring simultaneous processing of both types of features.…”
Section: Related Workmentioning
confidence: 99%
“…Although hybrid techniques have the potential to offer various advantages, there remains a lack of research investigating their complete capabilities. Furthermore, there is a lack of extensive exploration regarding the most effective methods for integrating these approaches, highlighting the necessity for additional investigation and experimentation in this domain [41].…”
Section: Related Workmentioning
confidence: 99%
“…The temporality of network conditions and their evolution finds a match in GTMs [296], which can excel in processing sequential data, translating it into a coherent narrative of how 6G network conditions might evolve in response to changes in the environment. This time-series interpretation offers network operators foresight, allowing for timely interventions and adjustments.…”
Section: ) Temporal Modeling and Predictionmentioning
confidence: 99%
“…The temporality of network conditions and their evolution finds a match in GTMs [293], which can excel in processing sequential data, translating it into a coherent narrative of how 6G network conditions might evolve in response to changes in the environment. This time-series interpretation offers network operators foresight, allowing for timely interventions and adjustments.…”
Section: ) Temporal Modeling and Predictionmentioning
confidence: 99%