2019
DOI: 10.1007/s11263-019-01248-3
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Image Networks for Human Action Recognition

Abstract: In this paper, we propose the use of a semantic image, an improved representation for video analysis, principally in combination with Inception networks. The semantic image is obtained by applying localized sparse segmentation using global clustering (LSSGC) prior to the approximate rank pooling which summarizes the motion characteristics in single or multiple images. It incorporates the background information by overlaying a static background from the window onto the subsequent segmented frames. The idea is t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(21 citation statements)
references
References 70 publications
(111 reference statements)
0
21
0
Order By: Relevance
“…The main difference between the existing studies [46,47] and the proposed method lies in the location of separating the object and the background. Specifically, [46,47] synthesizes the features of discriminative action at the network input stage, while the proposed method adaptively learns discerning features within the network. Section 4.C experimentally analyzes this phenomenon.…”
Section: Action Recognition With Dynamic Inputsmentioning
confidence: 97%
See 3 more Smart Citations
“…The main difference between the existing studies [46,47] and the proposed method lies in the location of separating the object and the background. Specifically, [46,47] synthesizes the features of discriminative action at the network input stage, while the proposed method adaptively learns discerning features within the network. Section 4.C experimentally analyzes this phenomenon.…”
Section: Action Recognition With Dynamic Inputsmentioning
confidence: 97%
“…Several techniques have been proposed to distinguish between objects and backgrounds by appropriately manipulating input information [46,47]. For example, the dynamic image [46], which is an image created by synthesizing appearance and dynamics, i.e., static and temporal information, can express the temporal information of a single RGB image, and multimodal input based on this can contribute to improve the performance of the action recognition task.…”
Section: Action Recognition With Dynamic Inputsmentioning
confidence: 99%
See 2 more Smart Citations
“…Bilen, H et al [7] use the pre-trained 2D CNN to recognize human actions, they introduce the concept of dynamic image, which is obtained by Rank Pooling [8] on video frames. Sunder Ali Khowaja and Seok-Lyong Lee [9] propose the semantic image for video analysis, which is obtained by applying localized sparse segmentation using global clustering prior to the approximate rank pooling. Recently, more and more 3D CNN are applied in the action recognition, like C3D [10] (Convolutional 3D), I3D (Inflated-3D) [11], ResNet [12] and so forth, where ResNet can be divided into 2D ResNet and 3D ResNet, and all of them have a common concept, that is Residual Learning.…”
Section: Introductionmentioning
confidence: 99%