2013 IEEE International Conference on Computer Vision Workshops 2013
DOI: 10.1109/iccvw.2013.61
|View full text |Cite
|
Sign up to set email alerts
|

Ordered Trajectories for Large Scale Human Action Recognition

Abstract: Recently, a video representation based on dense trajectories has been shown to outperform other human action recognition methods on several benchmark datasets. In dense trajectories, points are sampled at uniform intervals in space and time and then tracked using a dense optical flow field. The uniform sampling does not discriminate objects of interest from the background or other objects. Consequently, a lot of information is accumulated, which actually may not be useful. Sometimes, this unwanted information … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0
3

Year Published

2015
2015
2018
2018

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(19 citation statements)
references
References 28 publications
0
16
0
3
Order By: Relevance
“…Specially, our third approach outperforms other state-of-the-art methods with an overall accuracy of 75.05% by averaging over the three training/testing splits. This is slightly better than [23,18] who reported an average accuracies of 73.39% and 73.10%, and remarkably better than work of Karpathy et al, which is based on Convolutional Neural Networks (CNNs), presented at CVPR 2014 [10]. In addition, the confusion matrix of the third approach for the UCF101 dataset is shown in Figure 3.…”
Section: Classification Resultsmentioning
confidence: 68%
“…Specially, our third approach outperforms other state-of-the-art methods with an overall accuracy of 75.05% by averaging over the three training/testing splits. This is slightly better than [23,18] who reported an average accuracies of 73.39% and 73.10%, and remarkably better than work of Karpathy et al, which is based on Convolutional Neural Networks (CNNs), presented at CVPR 2014 [10]. In addition, the confusion matrix of the third approach for the UCF101 dataset is shown in Figure 3.…”
Section: Classification Resultsmentioning
confidence: 68%
“…Bag of Visual Words (BoVW) framework with local features and its variants [48,57,20,29,34] have dominated the research work of action recognition and showed their effectiveness in the recent THUMOS'13 Action Recognition Challenge [18]. As shown in Figure 1, the pipeline of BoVW for video based action recognition consists of five steps: (i) feature extraction, (ii) feature pre-processing, (iii) codebook generation, (iv) feature encoding, and (v) pooling and normalization.…”
Section: Introductionmentioning
confidence: 99%
“…The highest accuracy is obtained by Simonyan et al [56], 87.90 %, with discriminatively trained deep Convolutional Networks but at the price of a significantly higher computational complexity. We still obtain better results than Soomro et al [26], Jain et al [21], Karpathy et al [25] and Murthy et al [38], which obtain 43.90 %, 52.10 %, 65.40 % and 73.1 % respectively. However, by using the frame-based features, we obtain lower results than the best results of the THUMOS competition: [65] -authors propose a framework that incorporates the dense trajectories (HoG-3D / HoF-3D / MBH), spatio-temporal pyramids with a modified version of Fisher Kernel representation; [24] -authors propose a Bag-of-Features pipeline in combination with local SIFT pyramids (P-SIFT), opponent color keyframes (P-OSIFT), HoG-3D / HoF-3D / MBH.…”
Section: Comparison With the Baseline Versionsmentioning
confidence: 37%
“…Comparison with the baseline versions of the proposed approach -accuracy values (UCF101 [26] dataset; the best results are presented in bold)as STIPs and dense trajectory features; in[38] authors propose a new technique to match dense trajectories and remove those that contain background motion noise; authors in…”
mentioning
confidence: 99%