2021
DOI: 10.1016/j.neucom.2021.03.120
|View full text |Cite
|
Sign up to set email alerts
|

3D-TDC: A 3D temporal dilation convolution framework for video action recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(4 citation statements)
references
References 52 publications
0
4
0
Order By: Relevance
“…+e methods based on traditional handcrafted features have relatively low accuracy, poor robustness, and universality and are suitable for relatively simple gesture and action recognition with low hardware requirements but are difficult to handle human behavior recognition in complex scenes. +erefore, in general, the traditional method of feature extraction has poor generalization and is complicated to implement [12][13][14][15][16].…”
Section: Introductionmentioning
confidence: 99%
“…+e methods based on traditional handcrafted features have relatively low accuracy, poor robustness, and universality and are suitable for relatively simple gesture and action recognition with low hardware requirements but are difficult to handle human behavior recognition in complex scenes. +erefore, in general, the traditional method of feature extraction has poor generalization and is complicated to implement [12][13][14][15][16].…”
Section: Introductionmentioning
confidence: 99%
“…The 3D CNN-based framework has spatiotemporal modeling capabilities and improves the performance of video action recognition models. 3D ConvNets [31,3,11,24] extended 2D image models [14,36] to the spatial-temporal domain, treating spatial and temporal dimensions in the same way. C3D [31] stacked spatiotemporal convolution kernels to efficiently represent video dense structure.…”
Section: D Cnnmentioning
confidence: 99%
“…3-D CNN directly accepts continuous video frames as input. It uses a 3-D convolution kernel to extract spatial and temporal features simultaneously through series convolution, which shows greater advantages than 2-D CNN in the field of video processing [29], [30]. The 3-D CNN was first proposed by Ji et al [31] and applied to human behavior recognition.…”
Section: B Video Detection Of Pipeline Leakagementioning
confidence: 99%