2022
DOI: 10.1109/access.2022.3144035
|View full text |Cite
|
Sign up to set email alerts
|

An Improved Action Recognition Network With Temporal Extraction and Feature Enhancement

Abstract: Image classification and action recognition are both active research topics in the field of computer vision. However, the development of action recognition is rather slow compared with image classification, due to the difficulties in spatial-temporal information modeling. In this paper, we present TEFE, a deep structure combining temporal extraction with feature enhancement to explore the spatial coherence across temporal dimension. The temporal extraction (TE) module is used to capture the short-term and long… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 47 publications
0
5
0
Order By: Relevance
“…DMTNet achieved a 1% improvement in performance with 8 frames and a 1.8% improvement with 16 frames compared to TANet, which is our baseline that also uses the same attention strategy and adoptive kernel generating for depth-wise convolution. TEFE [34] showed competitiveness at the 16-frame sampling, but DMTNet exhibited significant results for real-time applications such as for 8-frame rapid inference. The results were obtained by testing the validation set.…”
Section: Results On Something-something V1mentioning
confidence: 99%
“…DMTNet achieved a 1% improvement in performance with 8 frames and a 1.8% improvement with 16 frames compared to TANet, which is our baseline that also uses the same attention strategy and adoptive kernel generating for depth-wise convolution. TEFE [34] showed competitiveness at the 16-frame sampling, but DMTNet exhibited significant results for real-time applications such as for 8-frame rapid inference. The results were obtained by testing the validation set.…”
Section: Results On Something-something V1mentioning
confidence: 99%
“…Before starting DMA transmission, DMA request should be issued. When a signal meeting the trigger condition enters the data acquisition card, the logic circuit (FPGA or PLD) on the board will drive the ADC to start sampling [6][7][8]. The timing function of the universal timer is enabled during programming.…”
Section: Sports Information Acquisitionmentioning
confidence: 99%
“…In [72], a differentiable similarity guided sampling module is introduced in the architecture of 3D-CNNs that measures the similarity of temporal feature maps and adaptively adjusts the temporal resolution. In [1], an efficient architecture is proposed, consisting of a 2D-CNN and two lightweight 1D-CNN-based branches to capture spatial information, short-and long-term motion dynamics, respectively, and a 3D-CNN feature enhancement module to obtain more fine-grained spatial and temporal cues. This architecture is much more efficient from SlowFast, which uses two 3D-ResNets in its branches.…”
Section: ) Top-down Approachesmentioning
confidence: 99%
“…, N . The feature representations are stacked row-wise to obtain matrix Γ ∈ R N ×F , Γ = [γ (1) , . .…”
Section: Video Gat a Video Representationmentioning
confidence: 99%
See 1 more Smart Citation