2021
DOI: 10.1007/s11042-021-10795-2
|View full text |Cite
|
Sign up to set email alerts
|

Binary dense sift flow based two stream CNN for human action recognition

Abstract: Two-stream CNN is a widely-used network for human action recognition. Two-stream CNN consists of a spatial stream and a temporal stream. The spatial stream, through which the RGB image passes, extracts the shape features of human motion. The temporal stream, through which the optical flow images pass, extracts the sequence features of the listed motions. However, because of the constraints of the optical flow, such as brightness, constancy, and piecewise smoothness, there are limitations to the performance of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(18 citation statements)
references
References 45 publications
0
18
0
Order By: Relevance
“…This section describes the proposed technique's comparison results, in which our novel technique is compared to baseline approaches such as volumetric Spatiograms of either the Local Binary Pattern (VS-LBP) [45], Local Binary Pattern (LBP) [46], Temporal Pyramid Matching of the Local Binary Pattern (TPM-LBP) [47], Pyramid Histogram of Gradients (PHOG) [48], as well as Scale Invariant Feature Transform (SIFT) [49].…”
Section: Comparison Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…This section describes the proposed technique's comparison results, in which our novel technique is compared to baseline approaches such as volumetric Spatiograms of either the Local Binary Pattern (VS-LBP) [45], Local Binary Pattern (LBP) [46], Temporal Pyramid Matching of the Local Binary Pattern (TPM-LBP) [47], Pyramid Histogram of Gradients (PHOG) [48], as well as Scale Invariant Feature Transform (SIFT) [49].…”
Section: Comparison Resultsmentioning
confidence: 99%
“…Accuracy (%) VS-LBP [45] 92.7 LBP [46] 91.5 TPM-LBP [47] 96.5 PHOG [48] 94.6 SIFT [49] 97.6 Proposed Method 99.5…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…e convolution layer is responsible for extracting the local features of the input image to obtain different feature maps; the pooling layer reduces the dimension of the extracted features of the convolution layer to retain important information while reducing the risk of overfitting due to nonessential information. Common pooling layer settings include average pooling and maximum pooling; the full connection layer plays a classification role in the network and enables sample data classification by mapping the learned feature data to the space of sample markers [6][7][8][9][10][11].…”
Section: Network Profilementioning
confidence: 99%
“…CNN has advantages in terms of computing speed and can form a good balance between accuracy and effectiveness in short videos. In order to take advantage of CNN's strengths and make up for its shortcomings, CNN has progressed from the previous 2D-CNN and 3D-CNN to the current two-stream CNN [7][8][9][10]. This has achieved good results.…”
Section: Introductionmentioning
confidence: 99%