2014
DOI: 10.1016/j.cviu.2014.01.002
|View full text |Cite
|
Sign up to set email alerts
|

Action recognition using global spatio-temporal features derived from sparse representations

Abstract: Recognizing actions is one of the important challenges in computer vision with respect to video data, with applications to surveillance, diagnostics of mental disorders, and video retrieval. Compared to other data modalities such as documents and images, processing video data demands orders of magnitude higher computational and storage resources. One way to alleviate this difficulty is to focus the computations to informative (salient) regions of the video. In this paper, we propose a novel global spatio-tempo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(27 citation statements)
references
References 55 publications
0
27
0
Order By: Relevance
“…Extraction of Salient region: In order to extract salient/important features some methods like commonly used are STIP method [8], sparse coding [7] and STFT. Here no extraction of human area done.…”
Section: Extracting Area Of Importance/targetmentioning
confidence: 99%
“…Extraction of Salient region: In order to extract salient/important features some methods like commonly used are STIP method [8], sparse coding [7] and STFT. Here no extraction of human area done.…”
Section: Extracting Area Of Importance/targetmentioning
confidence: 99%
“…However, how to fuse or integrate all streams is still an open question. Before deeply learned features became popular, there were many research approaches to video classification using various methods, especially handcrafted methods such as spatiotemporal features [1], dense trajectories [9], and local autocorrelation [19]. Three-dimensional (3D) CNN was the first attempt to train spatiotemporal features for video classification using deep CNNs.…”
Section: Related Workmentioning
confidence: 99%
“…1 Graduate School of Information Engineering, Hiroshima University, Higashi Hiroshima, Japan. 2 Department of Information Engineering, Hiroshima University, Higashi Hiroshima, Japan.…”
Section: Authors' Informationmentioning
confidence: 99%
See 1 more Smart Citation
“…We are using the seven translation, After the two feature sets are computed, the feature vectors are combined and encoded into a single code by using the bag of features algorithm [13]. The unsupervised learning K-means clustering and supervised learning K-nearest neighbor (KNN) are used for classification the different action from videos.…”
Section: Overview Of Our Approachmentioning
confidence: 99%